Rahal 2018

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

ICICDT 2018, Otranto, Italy Session A – Low Power

Low Power GDI ALU Design with Mixed Logic Adder


Functionality
Jean Abou Rahal, Bassel Maamari, Basma Hajri, Rouwaida Kanj, Mohammad M. Mansour, Ali Chehab
University of Texas Austin, Austin, Texas 78712
American University of Beirut, Beirut 1107 2020, Lebanon
jeanabourahal@utexas.edu, {bfm05, bh24, rk105, mmansour, chehab}@aub.edu.lb

ABSTRACT arithmetic operations. The impact on delay is negligible. All


In this paper, we present a low power Arithmetic Logic Unit (ALU) this comes at significant power reduction for the compact
in 90nm technology using optimized Gate Diffusion Input (GDI) implementation of the ALU.
design style. Inspired by the basic ten transistor adder cell The remainder of the paper is organized as follows. In
compactness, we redesign the architecture of the ALU, by Section 2, we provide a background review on GDI adder
combining the adder and logic functionality in a modified ten cell and basic ALU design. We present the proposed
transistor cell. The design is compact compared to other designs. compact low power design in Section Error! Reference
For low power applications, we add a transmission gate to isolate
the front-end logic section from the backend adder section of the
source not found.. The corresponding results and analysis
same cell. This restricts power activity only to the logic section are presented in Section 4. Finally, conclusions are presented
when the ALU performs logic operations, leading to more than 2X in Section 5.
power savings compared to no-transmission gate implementation.
All this comes with less than 9% increase in delay at 1.5V for an 8- 2. BACKGROUND OVERVIEW
bit ALU implementation, and around 50% reduction in the Figure 1 shows a basic two-transistor GDI cell [4]. By
transistor count compared to a traditional GDI ALU assigning different values to nodes G, P, and N the cell can
implementation with separate logic and arithmetic functionality
and enable signals. The design is simulated and verified using a
implement logic functions, such as the AND function, which
90nm TSMC design kit. typically require more transistors in standard CMOS logic.
A typical AND gate thus requires two transistors in a GDI
Keywords: Low power design, ALU, delay, GDI implementation, and an XOR gate requires 4 transistors.
Figure 2 presents a compact 10 transistor GDI full adder cell
1. INTRODUCTION [4, 5]. Figure 3 illustrates a simplified 1-bit ALU
With advances of technology in consumer electronics implementation. It comprises a logic block and an
and communications systems, there is a high demand for low add/subtract block [6]. A decoder is used to send the mux
power and high performance integrated circuit designs. The enable signals for the different logic or arithmetic output
Arithmetic Logic Unit (ALU) is a key component of any functions. In this example, the circuit implements
processor and its functionality is critical for digital signal AND/OR/invert logic functions.
processing and other communications and networking
protocols. Several works have been proposed in the literature
for low power ALU design. An integer ALU is typically
composed of two main sections: the arithmetic unit
represented by the add/subtract unit and the logic unit.
One of the most popular adder designs is the
conventional 28-transistor CMOS adder [1]. Several other
optimized variations have appeared, including the 16-
transistor transmission gate (TG) adder [2]. The authors in
Figure 1. 2-transistor basic GDI cell [4].
[3] develop a library of CMOS and TG adder cells and
compare the respective designs. Gate Diffusion Input
Technique (GDI) is a new design style that enables compact low
3. PROPOSED DESIGN
power designs. Several ALU designs have emerged using the 3.1 12T Combined logic/Arithmetic 1-bit ALU Cell
new 10-transistor GDI-based compact adder cell [4, 5]. Starting from the compact 10T adder cell, we propose a
In this paper, we propose a novel compact cell design that novel 12T low power GDI compact cell with Combined
combines the arithmetic and logic functionalities in one Logic and Arithmetic functionality (CLA–GDI cell) as
block. The logic and arithmetic units share a common front illustrated in Figure 4. ‘A’ and ‘B’ are the primary inputs to
end. The backend part of arithmetic unit is isolated using a the circuit. By assigning P, N and G to the values specified
switch to guarantee low power consumption during logic- in Table I, logic functions INV, AND, OR and XOR, along
only operations. Both front end and backend operate during with ADD and Subtract arithmetic functions may be
implemented. Thus, when N and P are assigned to 0 and

1
978-1-5386-2550-7/18/$31.00 ©2018 IEEE

9
Paper A3 ICICDT 2018, Otranto, Italy

Vdd, respectively, and B is assigned to G, the output at L2


performs the XOR function A⊕B. Other functions are
implemented accordingly.
To help reduce power consumption and prevent activity
from happening in the backend part of the cell when the
arithmetic functions are not employed, a transmission gate
(TG) is added (see shaded part in Figure 4) at node L2 at the
expense of a slight increase in the delay as will be explained
later. Signal S3 controls the transmission gate and is
common to all bits of the ALU. This signal can be generated
from the output mux signals. One may also add an inverter
at L1 (not shown here) for increased logic functionality
thereby allowing for NAND and NOR functions to be Figure 4. Proposed low power GDI combined logic and full adder
implemented as well. Note that, placing the TG at node L1 cell (CLA-GDI cell). A TG is added to eliminate activity in in the
is an alternative implementation. This implementation backend of the arithmetic part when only logic functions are
shields the AND/OR/INVERT functions from additional employed, hence saving power.
parasitic delay that may arise due to the second stage of the
Table I. Input combinations for Logic/adder functionality.
design for certain input combinations, as will be discussed Function S0 S1 N P G Output Node
in section 4. However, it exposes the second, third and fourth INV 0 0 0 Vdd B L1
stages to possible power activity when the logic is And 0 1 A 0 B L1
functioning. Thus, we opted for placing the TG at node L2 OR 1 0 Vdd A B L1
due to negligible activity in third and fourth stages of the XOR 0 0 0 Vdd B L2
design during logic operation, and due to the added XOR ADD 0 0 0 Vdd B Sum
functionality. Finally, it is worth mentioning that the critical SUB 1 1 0 Vdd B’ Sum
delay of the ALU is typically determined by the arithmetic
functionality delay. 3.2 ALU Organization
We discuss herein the implementation of an 8-bit ALU
using the proposed CLA-GDI cell design. For the input
stage, 4-to-1 multiplexers can be used to select proper N, P
and G combinations for a given function as illustrated in
Figure 6a. It is worth noting that N/P and G Muxes operate
in parallel. For more compact and fast evaluations, we opted
for an equivalent implementation that relies on 3-to-1 Mux
implementations for P and N, and a 2-to-1 Mux for G. The
latter is mainly used to pass the two’s complement for
purposes of subtraction to G. S0/S1 combinations remain
practically the same as Table I for the different logic
Figure 2. 10T GDI full adder cell [5]. functions. This therefore requires three control signals. The
3-to-1 Mux was designed using two 2-to-1 Muxes.

Figure 6. Multiplexer frontend for input combinations: (a) Option


1 relies on two signals based on Table I, (b) Option 2 is modified
to include 3 signals for reduced number of transistors.
Figure 3. Schematic of a 1-bit ALU [6].

10
ICICDT 2018, Otranto, Italy Session A – Low Power

Figure 7 presents a sketch of the 8-bit ALU arrangement. For to 200nm and 600nm, respectively.
the output stage, a 3-to-1 MUX is used to select one of output
4.1 Logic Functionality
nodes: L1 (AND/OR/INV), L2 (XOR) or Sum
Figure 9 presents the front end of the cell responsible for
(Add/Subtract). The CLA-GDI TG is ON only when we
logic switching. The XOR function output is node L2 and is
select the arithmetic functions; thus, the control signal to the
not affected by any parasitics. The common AND, OR, and
transmission gate S3 is the same as the output Mux control
invert output is node L1. For these functions, a shared path
signal. We add an inverter shared by the 8 bits of the ALU
may occur between L2 and L1. Thus, the logic delay may be
to activate TG properly as illustrated in Figure 8b.
affected by the previous logic state of L2. To explain this,
we demonstrate the case of the AND function. Specifically,
we consider the case when A and B switch to ‘1’. If the
previous state of L2=0, the delay of the gate will be affected
because both MN1 and MN2 are ON. The embedded table
of Fig. 9 lists one more case where the parasitics can affect
the AND gate delay. Table II presents the delay and power
of the different logic functionalities in the event of no
conflicting parasitics from L2. This is compared to that of
the equivalent two stage CMOS implementation of the
AND/OR/XOR. If there are parasitics from L2, the worst-
case delay of AND/OR/INVERT can approach that of the
GDI XOR gate, which is still slightly smaller than the
CMOS equivalent for the AND/OR. Most importantly, the
critical path delay is governed by the adder delay, which will
be discussed in the following section.
L2 implications on Worst case AND delay
P=0 Previous L2 Parasitics
MP2 A B L1
A state L2 on delay
MP1 0 0 0 0 No
0 1 0 0 No
L2
MN2 1 0 0 0 No
G=B 1 1 0 1 Yes
0 0 1 0 No
L1: AND output
MN1 0 1 1 0 Yes
1 0 1 0 No
1 1 1 1 No
N=A

Figure 9. AND operation delay considerations.


Figure 7. ALU schematic sketch for an 8-bit ALU. Table II. Average delay and power for the case of no conflicting
parasitics from node L2. Vdd=1V. Cycle time =10ns.
CLA-GDI CMOS
Avg. delay Avg. power Avg. delay Avg. power
(ps) (nW) (ps) (nW)
AND 7.55 254 27.875 417
OR 10.38 251.1 29.725 467.4
INV 9.745 117.7 11.015 120.7
XOR 30.215 631.8 28.65 992

4.2 Transmission Gate Implications


Figure 8. Multiplexer arrangement for output signal combinations It is possible to build the combined logic adder cell without
(a) Adopted 3-to-1 Mux. (b) Transmission gate (TG) control signal the TG. However, this has significant implications on power.
generation. Here, we study TG implications on power and critical path.
Power Implications: In the absence of the TG, during a
4. RESULTS AND ANALYSIS logic operation the signal may propagate all the way to the
The ALU was implemented in 90nm TSMC technology carry and sum outputs. Thus, the power consumption is
using Cadence virtuoso schematic editor [7]. We rely on similar to that of a 1-bit adder; this is around 1.3 μW for
HSPICE simulations to analyze the CLA-GDI cell logic and conditions similar to those of Table II. The TG, therefore,
arithmetic functionality, and the implications of the TG on results in more than 2x reduction in power for logic
the critical path delay and power savings. Throughout the operations in lieu of the 20% increase to a modified basic 1-
simulations, the NMOS and PMOS device widths were set bit adder cell (CLA-GDI with no TG, i.e, only

11
Paper A3 ICICDT 2018, Otranto, Italy

programmable P and N inputs). signals. Note that a basic CMOS implementation of Figure 3
8-bit ALU Critical Path Implications: We simulate the 8- requires more than 2X the number of devices for the GDI
bit ALU of Fig. 7 for Vdd ∈ [1.0V-1.8V]. We study the implementation. Finally, a very basic ALU with no enable
critical path delay of the adder functionality when the carry signals (basic logic gates, 10T adder and output
ripples through the different stages. Figure 10. presents the multiplexers) requires 224 transistors. However, such an
carry out waveforms. The rippling propagates through the ALU will encompass switching of the adder and logic gates
backend stage of the cell not involving the TG; hence, the simultaneously independent of the function evaluations
delay overhead due to the TG is small, and found to be ~9% leading to extra power overhead and is not practical.
at 1.5V (see Fig. 11). Table IV. Comparison of the number of transistors for
proposed design implementations for an 8-bit ALU using the
combined cell. This is compared to a traditional GDI
implementation. CMOS implementations are more than double
the GDI implementation.
Design Power Savings # Transistors
Yes via
AND/OR/XOR/Invert/
transmission
1. ADD/Sub 208
gate
(CLA-GDI)
AND/OR/INV/
Figure 10. Snapshot of the Carry out (Cout) waveforms of the ADD/Sub Enable Signals
2. 416
critical path for Vdd ∈ [1V-1.8V]. (using 10T GDI adder (Fig.3)
cell) Ref [6]

5. CONCLUSIONS
An 8-bit GDI ALU using a novel CLA-GDI compact cell for
combined logic/arithmetic functionality has been
implemented. The proposed GDI cell merges multiple
logical and arithmetic functionalities, which are typically
implemented using disjoint complex gates, into a unified cell
with almost half the transistor count. To minimize power
consumption, the proposed cell embeds a transmission gate
to decouple the frontend from the backend of the cell, hence
Figure 11. Critical Path Delay for 8-bit Adder. Note that the GDI eliminating any switching activity in the arithmetic part
only has no switch and can lead to high power consumption for when only logic operations are performed. Experimental
logic evaluations. simulations demonstrate a 2X reduction in switching power
compared to no transmission gate implementation, with a
4.3 8-bit ALU Power Delay Product and Area graceful degradation in delay by 9% compared to the basic
Table III presents the power delay product (PDP) for the GDI cell. We also note 1.7X reduction in PDP compared to
critical path simulations for the CLA-GDI ALU. The PDP is a standard CMOS implementation.
1.7X smaller than that of the critical path of a CMOS based
design where the adder is implemented using XOR gates.
6. REFERENCES
Table III. 8-bit ALU power delay product. Vdd=1.5V. Cycle=1ns. [1] R. Shalem, E. John, and L.K. John, “A novel low-power energy
Rise Fall Avg. Avg. recovery full adder cell,” in Proc. Great Lake Sympos., Feb.
delay Power delay Power power delay Normalized 1999, pp. 380–383.
(ps) (uW) (ps) (uW) (uW) (ps) Average PDP [2] M. Alioto and G. Palumbo. “Analysis and comparison on full
CLA-GDI adder block in submicron technology,” IEEE Transactions on
(w logic 316.0 358.0 59.9 398.0 378.0 187.9 1.00 VLSI Systems, vol. 10, no. 6, pp. 806–823, Dec. 2002.
capability) [3] A. Shams, T. Darwish, and M. Bayoumi, “Performance analysis
CMOS 308.6 488.0 77.6 767.0 627.5 193.1 1.71 of low-power 1-bit CMOS full adder cells.” IEEE Transactions
on VLSI Systems, vol. 10, no. 1, pp. 20–29, Aug. 2002.
Finally, Table IV presents a comparison of the number of [4] P.-M. Lee, C.-H. Hsu, and Y.-H. Hung, “Novel 10-T full adders
transistors for the proposed implementation (CLA-GDI) realized by GDI structure.” In Proc. IEEE Int. Sympos. on
compared to that presented in Figure 3 for a traditional GDI- Integrated Circuits, Singapore, 2007, pp. 115–118.
based ALU implementation with logic and arithmetic [5] F. Moradi, et al. “Ultra low power full adder topologies,” in
functionalities. The CLA-GDI design requires 208 Proc. IEEE Int. Sympos. on Circuits and Systems (ISCAS 2009),
transistors for the 8-bit ALU. The traditional design employs Taipei, 2009, pp. 3158–3161.
416 transistors; the latter assumes using the 10T GDI adder [6] A. Tanenbaum, Structured computer organization, 7th ed..
Pearson Education, 2016.
cell and AND/OR/INVERT logic functionality with enable [7] https://www.cadence.com

12

You might also like