01590080

A Self-Testing Fully Pipelined Implementation
for the Advanced Encryption Standard

Mahdi Nazm-Bojnordi, Naser Sedaghati-Mokhtari, and Seid Mehdi Fakhraie
fm.bojnordi, n.sedaghati} @ece.ut.ac.ir, fakhraie@ut.ac.ir
Silicon Intelligence and VLSI Signal Processing Laboratory
ECE Department, University of Tehran, Tehran, IRAN.
many places in Internet applications, such as routers.
Abstract- In contrast to software implementation, hardware Generally, when there is no concern on performance
implementation of encryption protocols provides a higher level of objectives and higher encryption or decryption speed, the
security and cryptography speed at some flexibility cost. In this software implementation may be the appropriate choice. In
paper, different existing implementations of Advanced addition to lower speed of the software implementations, since
Encryption Standard (AES) are considered and a fully pipelined
implementation for the AES is presented. Implementation the AES is a symmetric key encryption, the cipher key is easily
considers both encryption and decryption. The design is vulnerable. In hardware implementation, however it is very
optimized for achieving higher speed and lower area cost. The hard to detect the cipher key by the attacker.
Selected algorithm for our design is Rijndael. The major part of As a consequence, there is a growing interest in efficient
an AES design is designing substitute boxes (S-box). S-boxes in implementations of the AES. For many applications, these
our design are implemented at a lower cost rather than the implementations need to be resistant against side channel
existing implementations. Throughput of up to 6 Gbps is gained
by our proposed architecture. This implementation is equipped attacks, i.e. it should not be too easy to extract secret
with BIST architecture for self testing. information from physical measurements on the device. Also
there are many applications using AES as an efficient
Index Terms-AES, self-testing, BIST, Rijndael, fully pipeline encryption method to secure their data exchange, such as the
implementation. applications working on the networks. Also, the AES is an
efficient and reliable method to be used in real-time
I. INTRODUCTION applications such as multimedia broadcasting. Using the AES
THE use of encryption/decryption is as old as the art of for real-time applications needs to consider low delay and fast
communication. The Advanced Encryption Standard architectures. In the literature, there are some AES hardware
(AES) is an encryption algorithm for securing sensitive but implementations for both ASIC and FPGAs. Also speed up to
unclassified material by U.S. Government agencies. In March 609 Mbps is available for ASIC technology implementation
1999, the National Institute of Standards and Technology [4]. In this paper a fast fully pipelined architecture for the AES
(NIST) organized the Second Advanced Encryption Standard encryption method is implemented that is suitable for securing
Candidate Conference (SAESCC) in Rome where a series of data exchange in real-time applications such as video
analyses of various algorithms were presented. This analysis encryption.
includes evaluating not only their security capabilities, but also The rest of this paper is organized as follows: The AES
their performance, flexibility of implementation and other algorithm is explained in Section II, its hardware
issues [1] Finally in October 2000, NIST announced the implementation and the BIST architecture are discussed in
Rijndael as the winner algorithm for AES. In many Section III and the paper conclusion appeares in Section IV.
implementations, Rijndael shows that it is an efficient
algorithm for the AES; for example, in IPsec applications that
handle more than thousands of security associations, it works II. ALGORITHM
very well [2]. This section introduces a briefing on the selected algorithm
In November 2001, the AES was accepted as a standard of by the NIST for the AES. The Rijndael algorithm is a
the Federal Information Processing Standards (FIPS) [3]. symmetric block cipher (series of transformations that converts
Since then, because of high volume uses of encryption plaintext to ciphertext) methodology that can process data
applications, many software and hardware implementations blocks of 128 bits, using cipher keys with lengths of 128, 192,
have been published for the AES, where each of them consider and 256 bits. However, in excess of AES design criteria, the
features of optimization differently. For example, it is used in block sizes can mirror those of the keys [3]. Rijndael uses a
variable number of rounds, depending on key/block sizes, as
0-7803-9262-0/05/$20.00O©2005 IEEE 260

follows: a) 9 rounds if the key/block size is 128 bits. b) 11 transformnation. The MixColumns function takes four bytes as
rounds if the key/block size is 192 bits. c) 13 rounds if the input and outputs four bytes, where each input byte affects all
key/block size is 256 bits. The AES block can be divided into four output bytes. Together with ShiftRows, MixColumns
four functional blocks; each of them operates on 128-bit input provides diffusion in the cipher. Each column is treated as a
data (called state) and prepares the 128-bit output state for the polynomial over GF (28) and is then multiplied modulo X4 + I
next block. The State can be thought of as an array, structured with a fixed polynomial c(x) = 3x3 + x2 + x + 2. The
with 4 rows and the column number being the block length MixColumns block can also be viewed as a matrix multiply in
divided by bit length (for example, divided by 32 and equal to Rijndael's finite field. This transformation operates on the state
4 that perform an 4x4 array) . AES functional blocks are as column-by-column.
they are described in the following. D. AddRoundKey Transformation
A. SubBytes Transformation In this transfornation a Round Key is added to the state by a
This is a non-linear transformation that operates simple bitwise XOR operation. Using this transformation
independently on each byte of the state using a substitution encryption process depends on the secret key. As will be
table (called S-box) [5]. The S-box is a one-to-one mapping discussed later, in each round of encryption process the state is
table and consequently it is invertible. In the SubBytes step, XORed bit-by-bit with a key pattern that prepared for the same
each byte in the array is updated using an 8-bit S-box. This round. Also, this transformation is used in the decryption
operation provides the non-linearity in the cipher. This process.
transfornation is constructed based on the two following Each sequence of these four operations constitutes one
phases: round of the encryption process and in the reverse order is
used for the decryption process. Number of rounds in an AES
1. Take the multiplicative inverse in the finite field GF (28), implementation depends on key length (see TABLE I).
(Galois fields) [6], Initializing the process is done by XORing the cipher key
2. Apply an affine (over GF (2)) transformation defined by and input data. After initializing, the state goes through the
b,'=bi b(e+4)44)m 0b(i+5)moE 9b(i+6)moE Obl(i+7)mos8 eCi rounds transformations. In the last round, however, the
To avoid attacks based on simple algebraic properties, the MixColumn block does not exist. On the other hand,
S-box is constructed by combining the inverse function with appropriate keys for each round of the process are fed to
an invertible affine transformation. The S-box is also chosen to AddRoundKey block by KeyGenerator block. This process
avoid any fixed points (and so is a rearrangement), and also which is called KeyScheduling, consists of some XOR
any opposite fixed points. operations and byte substitution on the cipher key.
All of these operations are used to decrypt the cipher text
B. ShiftRows Transformation and extract the plaintext in the inverse order.
The ShiftRows block operates on the rows of the state; it
cyclically shifts the bytes in each row by a certain offset. For TABLE I
AES, the first row is left unchanged. Each byte of the second NUMBER OF ROUNDS AccoRDING TO KEY LENGTH
row is shifted one to the left. Similarly, the third and fourth Key Length Block Size Number of
rows are shifted by offsets of two and three respectively. In (Nk words) (Nb words) Rounds (Nr)
AES-128 4 4 10
this way, each column of the output state of the ShiftRows AES-192 6 4 12
block is composed of bytes from each column of the input AES-256 8 4 14
state. (as shown at Fig. 1.) (Rijndael variants with a larger block
size have slightly different offsets). In fact this block is used TABLE 11
COMPARISONS OF 8-BIT S-BOXES
for shuffling the state.
SRAM ROM
S 256x8 2x256x8 GF(24) Our design
S0,0 SOI S0,2 S,3 solo .S,3 Gate count 2138 1866 762 754
Delay (ns) - - 5 4.88
Sl,O S1,1 S1,2 Si,'l S,2 S1,3 S,0
S2,0 S2,1 s2,2 S2,3

0000EEK__T__1.
s2,2 . 2,3 S2,0 .S2,1 III. IMPLEMENTATION AND DISCUSSION
0000000009-1.
S3,0 S3,1 S3,2 S3,3 NM___J] S3,1 S3,2 S3,3 .S3,0 Implementation of our proposed AES architecture is
Fig. 1. ShiftRows transformation. described in three major parts as follows.
C. MixColumns Transformation A. S-box Implementation
In the MixColumns block, the four bytes of each column of A huge amount of area in AES implementation is occupied
the state are combined using an invertible linear by the S-box lookup tables. Therefore, most of the AES
261
implementations are concentrated on optimizing the S-boxes The concern of this paper for optimization is achieving
used in the design. In some of AES implementations S-box is higher speed rather than minimizing area. In Fig. 2, a block
implemented using LUT which is efficient for FPGA not diagram of our design is illustrated.
ASIC. An efficient method of S-box implementation As depicted in Fig. 2, the design has an unrolled pipelined
represents the polynomial elements in GF (28) as the elements structure. To prepare the keys for pipeline stages, there is a
in GF (24), and then uses a mapping followed by an affine key generator block which gets the cipher key as its input and
transformation [5]. has eleven output ports for feeding the appropriate keys to the
An 8-bit S-box which is implemented in 0.35pm ASIC stages of pipeline at proper times. This component has a
technology using this method consists of 762 gates and has a pipelined structure that lets the design work properly without
delay of Sns [7]. We have also implemented an 8-bit S-box to any stalls. All the blocks in Fig. 2 have a 1-bit controlling
compare our results to the implemented architecture in [7]. input that specifies the operational mode of each of them. Each
The results obtained for our design appear in TABLE II. block is able to operate for encryption or decryption process.
As mentioned before, S-box in AES implementation is the Using this structure, the AES chip can switch between
most frequently used component. In our design, 200 instances encryption and decryption modes freely. Moreover, input
of S-box are used. 81% of the design is occupied by the S- cipher key may change without any pipeline stalls if needed.
boxes. Each pipeline stage has a general structure as illustrated in
Fig. 3. The paths of encryption and decryption are separated
B. AES Core Implementation
by two multiplexers.
For AES, there are several methods of implementation with
respect to area or delay optimizations, such as high throughput
loop iterative implementation presented in [7] or loop
unrolling and pipelining implementation [8].
Input Data-
Fig. 2 Unrolled & pipelined AES architecture.
Input
Fig. 3 Each middle pipeline stage.
Fig. 4 The first pipeline stage.
262
Input Output
Fig. 5 Last pipeline stage.
For the middle stages of the pipeline, this structure is used as IV. CONCLUSIONS
shown in Fig. 3. For the first stage component, however, an This paper made brief analysis on AES implementations
a
AddRoundKey block is added to the head of this structure and presented a testable unrolled pipelined implementation for
(Fig. 4.). The last stage, MixColumns and InvMixColumns AES. The design is synthesized using a 0.351im ASIC library,
blocks are removed from this structure (Fig. 5). for which a delay less than 20ns is extracted for each pipeline
stage of the design. Therefore it can achieve a maximum
A. BISTImplementation throughput of 6 Gbps. With the added BIST architecture we
To make our design testable, Built-In Self-Test (BIST) obtained test coverage about 98%. A top level diagram for our
architecture is added to the design. Since all the primary design is shown in Fig. 7.
inputs and outputs are buffered, there is no need to parallel
pattem generator/collector. All the flip-flops in the design are Input=128-bit
extracted and modified to be connected as a scan chain. Using t-:bit* Output
a 32-bit LFSR (Linear Feedback Shift Register, that generates Cipher Key12-Wbit Testable AES - Faulty
pseudo-random numbers) test vectors are generated and using BIST
a 32-bit SISR (Single Input Signature Register), signatures are En/De -Test Finished
collected. Fig. 6 shows the implemented BIST for our design. Clock
Fig. 7 Implemented AES block.
REFERENCES
[I] D. Forte, "The future of the advanced encryption standard," In Journal
ofNetwork Security, Elsevier Science Ltd, pp. 10-13, 1999.
[2] D. Piper, "The Internet IP Security Domain of Interpretation for
ISAKMP," RFC 2407, Nov 1998.
[3] "Advanced Encryption Standard (AES)," Federal Information Processing
Standards Publication 197, Nov. 26, 2001.
Fig. 6 BIST architecture for AES. [4] C. Lu, S, Tseng, "Integrated design of AES (Advanced Encryption
Standard) encrypter and decrypter," in Proc. International Conference
on Application-Specific Systems, pp. 277-285, Jul. 2002.
Without the ROM unit which is containing the signatures for [5] Rijmen Vincent, "Efficient Implementation of the Rijndael S-box,"
comparison, overhead of the implemented BIST for our Available on: http://www.iaik.tu-
graz.ac.at/research/krypto/AES/old/-ri jmen/rijndael/sbox.pdf
design is just 636 gates. This number of gates is much less [6] A. J. Menezes, P. C. van Oorschot and S. A. Vanstone, Handbook of
than core gate count that can be ignored (636 vs. 180176). Applied Cryptography. CRC Press, 1996.
The major part of the BIST architecture is a ROM component [7] T. F. Lin, C. P. Su, C. T. Huang and C. W. Wu, "A High-throughput
low-cost AES cipher chip," in Proc. IEEE Asia-Pacific Conference on
containing signatures which are obtained by applying the ASIC, pp. 85 88, 2002.
generated test vectors to the golden model (of the design) at [8] A. Hodjat, W. Ingrid, "A 21.54 Gbits/s fully pipelined processor on
the simulation time. FPGA," 12th Annual IEEE Symposium on Field-Programmable Custom
Computing Machines, pp. 308 309, April 2004.
-
263

01590080

Uploaded by

Copyright:

Available Formats

01590080

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

01590080

Uploaded by

Copyright:

Available Formats

A Self-Testing Fully Pipelined Implementation

for the Advanced Encryption Standard

0-7803-9262-0/05/$20.00O©2005 IEEE 260

S2,0 S2,1 s2,2 S2,3

Fig. 2 Unrolled & pipelined AES architecture.

Fig. 3 Each middle pipeline stage.

Fig. 4 The first pipeline stage.

Fig. 5 Last pipeline stage.

You might also like