Rahal 2018
Rahal 2018
Rahal 2018
1
978-1-5386-2550-7/18/$31.00 ©2018 IEEE
9
Paper A3 ICICDT 2018, Otranto, Italy
10
ICICDT 2018, Otranto, Italy Session A – Low Power
Figure 7 presents a sketch of the 8-bit ALU arrangement. For to 200nm and 600nm, respectively.
the output stage, a 3-to-1 MUX is used to select one of output
4.1 Logic Functionality
nodes: L1 (AND/OR/INV), L2 (XOR) or Sum
Figure 9 presents the front end of the cell responsible for
(Add/Subtract). The CLA-GDI TG is ON only when we
logic switching. The XOR function output is node L2 and is
select the arithmetic functions; thus, the control signal to the
not affected by any parasitics. The common AND, OR, and
transmission gate S3 is the same as the output Mux control
invert output is node L1. For these functions, a shared path
signal. We add an inverter shared by the 8 bits of the ALU
may occur between L2 and L1. Thus, the logic delay may be
to activate TG properly as illustrated in Figure 8b.
affected by the previous logic state of L2. To explain this,
we demonstrate the case of the AND function. Specifically,
we consider the case when A and B switch to ‘1’. If the
previous state of L2=0, the delay of the gate will be affected
because both MN1 and MN2 are ON. The embedded table
of Fig. 9 lists one more case where the parasitics can affect
the AND gate delay. Table II presents the delay and power
of the different logic functionalities in the event of no
conflicting parasitics from L2. This is compared to that of
the equivalent two stage CMOS implementation of the
AND/OR/XOR. If there are parasitics from L2, the worst-
case delay of AND/OR/INVERT can approach that of the
GDI XOR gate, which is still slightly smaller than the
CMOS equivalent for the AND/OR. Most importantly, the
critical path delay is governed by the adder delay, which will
be discussed in the following section.
L2 implications on Worst case AND delay
P=0 Previous L2 Parasitics
MP2 A B L1
A state L2 on delay
MP1 0 0 0 0 No
0 1 0 0 No
L2
MN2 1 0 0 0 No
G=B 1 1 0 1 Yes
0 0 1 0 No
L1: AND output
MN1 0 1 1 0 Yes
1 0 1 0 No
1 1 1 1 No
N=A
11
Paper A3 ICICDT 2018, Otranto, Italy
programmable P and N inputs). signals. Note that a basic CMOS implementation of Figure 3
8-bit ALU Critical Path Implications: We simulate the 8- requires more than 2X the number of devices for the GDI
bit ALU of Fig. 7 for Vdd ∈ [1.0V-1.8V]. We study the implementation. Finally, a very basic ALU with no enable
critical path delay of the adder functionality when the carry signals (basic logic gates, 10T adder and output
ripples through the different stages. Figure 10. presents the multiplexers) requires 224 transistors. However, such an
carry out waveforms. The rippling propagates through the ALU will encompass switching of the adder and logic gates
backend stage of the cell not involving the TG; hence, the simultaneously independent of the function evaluations
delay overhead due to the TG is small, and found to be ~9% leading to extra power overhead and is not practical.
at 1.5V (see Fig. 11). Table IV. Comparison of the number of transistors for
proposed design implementations for an 8-bit ALU using the
combined cell. This is compared to a traditional GDI
implementation. CMOS implementations are more than double
the GDI implementation.
Design Power Savings # Transistors
Yes via
AND/OR/XOR/Invert/
transmission
1. ADD/Sub 208
gate
(CLA-GDI)
AND/OR/INV/
Figure 10. Snapshot of the Carry out (Cout) waveforms of the ADD/Sub Enable Signals
2. 416
critical path for Vdd ∈ [1V-1.8V]. (using 10T GDI adder (Fig.3)
cell) Ref [6]
5. CONCLUSIONS
An 8-bit GDI ALU using a novel CLA-GDI compact cell for
combined logic/arithmetic functionality has been
implemented. The proposed GDI cell merges multiple
logical and arithmetic functionalities, which are typically
implemented using disjoint complex gates, into a unified cell
with almost half the transistor count. To minimize power
consumption, the proposed cell embeds a transmission gate
to decouple the frontend from the backend of the cell, hence
Figure 11. Critical Path Delay for 8-bit Adder. Note that the GDI eliminating any switching activity in the arithmetic part
only has no switch and can lead to high power consumption for when only logic operations are performed. Experimental
logic evaluations. simulations demonstrate a 2X reduction in switching power
compared to no transmission gate implementation, with a
4.3 8-bit ALU Power Delay Product and Area graceful degradation in delay by 9% compared to the basic
Table III presents the power delay product (PDP) for the GDI cell. We also note 1.7X reduction in PDP compared to
critical path simulations for the CLA-GDI ALU. The PDP is a standard CMOS implementation.
1.7X smaller than that of the critical path of a CMOS based
design where the adder is implemented using XOR gates.
6. REFERENCES
Table III. 8-bit ALU power delay product. Vdd=1.5V. Cycle=1ns. [1] R. Shalem, E. John, and L.K. John, “A novel low-power energy
Rise Fall Avg. Avg. recovery full adder cell,” in Proc. Great Lake Sympos., Feb.
delay Power delay Power power delay Normalized 1999, pp. 380–383.
(ps) (uW) (ps) (uW) (uW) (ps) Average PDP [2] M. Alioto and G. Palumbo. “Analysis and comparison on full
CLA-GDI adder block in submicron technology,” IEEE Transactions on
(w logic 316.0 358.0 59.9 398.0 378.0 187.9 1.00 VLSI Systems, vol. 10, no. 6, pp. 806–823, Dec. 2002.
capability) [3] A. Shams, T. Darwish, and M. Bayoumi, “Performance analysis
CMOS 308.6 488.0 77.6 767.0 627.5 193.1 1.71 of low-power 1-bit CMOS full adder cells.” IEEE Transactions
on VLSI Systems, vol. 10, no. 1, pp. 20–29, Aug. 2002.
Finally, Table IV presents a comparison of the number of [4] P.-M. Lee, C.-H. Hsu, and Y.-H. Hung, “Novel 10-T full adders
transistors for the proposed implementation (CLA-GDI) realized by GDI structure.” In Proc. IEEE Int. Sympos. on
compared to that presented in Figure 3 for a traditional GDI- Integrated Circuits, Singapore, 2007, pp. 115–118.
based ALU implementation with logic and arithmetic [5] F. Moradi, et al. “Ultra low power full adder topologies,” in
functionalities. The CLA-GDI design requires 208 Proc. IEEE Int. Sympos. on Circuits and Systems (ISCAS 2009),
transistors for the 8-bit ALU. The traditional design employs Taipei, 2009, pp. 3158–3161.
416 transistors; the latter assumes using the 10T GDI adder [6] A. Tanenbaum, Structured computer organization, 7th ed..
Pearson Education, 2016.
cell and AND/OR/INVERT logic functionality with enable [7] https://www.cadence.com
12