01590080
01590080
01590080
261
implementations are concentrated on optimizing the S-boxes The concern of this paper for optimization is achieving
used in the design. In some of AES implementations S-box is higher speed rather than minimizing area. In Fig. 2, a block
implemented using LUT which is efficient for FPGA not diagram of our design is illustrated.
ASIC. An efficient method of S-box implementation As depicted in Fig. 2, the design has an unrolled pipelined
represents the polynomial elements in GF (28) as the elements structure. To prepare the keys for pipeline stages, there is a
in GF (24), and then uses a mapping followed by an affine key generator block which gets the cipher key as its input and
transformation [5]. has eleven output ports for feeding the appropriate keys to the
An 8-bit S-box which is implemented in 0.35pm ASIC stages of pipeline at proper times. This component has a
technology using this method consists of 762 gates and has a pipelined structure that lets the design work properly without
delay of Sns [7]. We have also implemented an 8-bit S-box to any stalls. All the blocks in Fig. 2 have a 1-bit controlling
compare our results to the implemented architecture in [7]. input that specifies the operational mode of each of them. Each
The results obtained for our design appear in TABLE II. block is able to operate for encryption or decryption process.
As mentioned before, S-box in AES implementation is the Using this structure, the AES chip can switch between
most frequently used component. In our design, 200 instances encryption and decryption modes freely. Moreover, input
of S-box are used. 81% of the design is occupied by the S- cipher key may change without any pipeline stalls if needed.
boxes. Each pipeline stage has a general structure as illustrated in
Fig. 3. The paths of encryption and decryption are separated
B. AES Core Implementation
by two multiplexers.
For AES, there are several methods of implementation with
respect to area or delay optimizations, such as high throughput
loop iterative implementation presented in [7] or loop
unrolling and pipelining implementation [8].
Input Data-
Input
262
Input Output
For the middle stages of the pipeline, this structure is used as IV. CONCLUSIONS
shown in Fig. 3. For the first stage component, however, an This paper made brief analysis on AES implementations
a
AddRoundKey block is added to the head of this structure and presented a testable unrolled pipelined implementation for
(Fig. 4.). The last stage, MixColumns and InvMixColumns AES. The design is synthesized using a 0.351im ASIC library,
blocks are removed from this structure (Fig. 5). for which a delay less than 20ns is extracted for each pipeline
stage of the design. Therefore it can achieve a maximum
A. BISTImplementation throughput of 6 Gbps. With the added BIST architecture we
To make our design testable, Built-In Self-Test (BIST) obtained test coverage about 98%. A top level diagram for our
architecture is added to the design. Since all the primary design is shown in Fig. 7.
inputs and outputs are buffered, there is no need to parallel
pattem generator/collector. All the flip-flops in the design are Input=128-bit
extracted and modified to be connected as a scan chain. Using t-:bit* Output
a 32-bit LFSR (Linear Feedback Shift Register, that generates Cipher Key12-Wbit Testable AES - Faulty
pseudo-random numbers) test vectors are generated and using BIST
a 32-bit SISR (Single Input Signature Register), signatures are En/De -Test Finished
collected. Fig. 6 shows the implemented BIST for our design. Clock
Fig. 7 Implemented AES block.
REFERENCES
[I] D. Forte, "The future of the advanced encryption standard," In Journal
ofNetwork Security, Elsevier Science Ltd, pp. 10-13, 1999.
[2] D. Piper, "The Internet IP Security Domain of Interpretation for
ISAKMP," RFC 2407, Nov 1998.
[3] "Advanced Encryption Standard (AES)," Federal Information Processing
Standards Publication 197, Nov. 26, 2001.
Fig. 6 BIST architecture for AES. [4] C. Lu, S, Tseng, "Integrated design of AES (Advanced Encryption
Standard) encrypter and decrypter," in Proc. International Conference
on Application-Specific Systems, pp. 277-285, Jul. 2002.
Without the ROM unit which is containing the signatures for [5] Rijmen Vincent, "Efficient Implementation of the Rijndael S-box,"
comparison, overhead of the implemented BIST for our Available on: http://www.iaik.tu-
graz.ac.at/research/krypto/AES/old/-ri jmen/rijndael/sbox.pdf
design is just 636 gates. This number of gates is much less [6] A. J. Menezes, P. C. van Oorschot and S. A. Vanstone, Handbook of
than core gate count that can be ignored (636 vs. 180176). Applied Cryptography. CRC Press, 1996.
The major part of the BIST architecture is a ROM component [7] T. F. Lin, C. P. Su, C. T. Huang and C. W. Wu, "A High-throughput
low-cost AES cipher chip," in Proc. IEEE Asia-Pacific Conference on
containing signatures which are obtained by applying the ASIC, pp. 85 88, 2002.
generated test vectors to the golden model (of the design) at [8] A. Hodjat, W. Ingrid, "A 21.54 Gbits/s fully pipelined processor on
the simulation time. FPGA," 12th Annual IEEE Symposium on Field-Programmable Custom
Computing Machines, pp. 308 309, April 2004.
-
263