banerjee2017

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2017 IEEE Computer Society Annual Symposium on VLSI

A General Design Framework for Sparse Parallel


Prefix Adders
Soumya Banerjee and Wenjing Rao
ECE Department, University of Illinois at Chicago, Chicago, IL 60607, USA
{sbaner8, wenjing}@uic.edu

Abstract—This paper proposes a design framework for a sparseness etc) to understand the power-performance trade-
family of Parallel Prefix Adders (PPA), based on the concept of offs. Other examples of Sparse PPA’s include Conditional Sum
sparsity. The proposed design is achieved via a generalization on Adder [7] and Carry Select Adder [8]. Overall, most of the
the specific property of Group Segregation presented in Kogge-
Stone Adders (KSA). Here the computations of even and odd previous work are either limited to a narrow scope or ad-hoc
bits (forming two groups) are mutually disjoint. Furthermore, in nature. There lacks a systematic framework of constructing
the two disjoint groups can produce the computations of each Sparse PPA’s in general.
other with a reasonable hardware and performance overhead. This paper focuses on general Sparse PPA’s. A framework
This opens up the potential of (a sparse version) using only one is proposed to derive various sparse designs for any existing
group and extending its computation to cover all the bits. In
this paper, we generalize the formation of Group Segregation PPA structure. The resultant approach opens up a large set of
beyond the special case of KSA, and propose a framework to available PPA designs, with various combinations of structures
design a plethora of sparse PPA designs. These new sparse PPA’s and sparsities, such that a designer can choose among them
can be formed based on two dimensions of choices: an original along the Pareto-front. The motivation stems from the special
type of PPA and group number. As a result, the proposed design properties displayed in Kogge-Stone Adders (KSA’s), where
framework provides significant boost in the PPA design trade-off
space. the bits are divided into two groups (even and odd bits),
and the computations of each group are inherently segregated.
I. I NTRODUCTION Furthermore, the prefix computation property ensures that the
computation result of one group (even or odd bits) can be
Parallel Prefix Adders (PPA’s) provide a unifying foundation derived from the other group with a small overhead. We extend
for designing a large variety of adders. Sparse PPA’s have been this principle to apply to arbitrary PPA structures, as a general
proposed in [1], [2] to explore additional performance-power design framework for Sparse PPA’s.
trade-offs. Sparse PPA’s are constructed by allowing only a
subset of bits (denoted as “Essential Bits” in this paper) of II. P RELIMINARIES
the entire adder to compute their own results, with the rest of PPA’s [9] use three steps to compute the sum bits:
the bits (denoted as “Non-Essential Bits”) deriving their results 1. Bit-level Generate and Propagate: for each input bit
from the Essential Bits. Such a “sparse” design enables more pair Ai and Bi , the bit-level Generate and Propagate signals
varieties of PPA structures and thus facilitating design space (Gi:i , Pi:i ) are computed as:
trade-off explorations.
(Gi:i , Pi:i ) = (Ai · Bi , Ai ⊕ Bi )
In the field of general PPA design, [3], [4] propose ILP
based approaches to obtain a PPA structure satisfying a given 2. Block-level Generate and Propagate: this stage derives
set of constraints such as area, power, and performance. (Gi:0 , Pi:0 ) for all the bits i. Such results can be computed
These approaches are able to search within the huge design via the following prefix formulation of sub-blocks:
space of possible structures, to obtain a PPA that meets the
specifications. However, the runtime complexity is high, and (Gi:j , Pi:j ) = (Gi:k1 , Pi:k1 ) ◦ (Gk1 :k2 , Pk1 :k2 )◦
(Gk2 :k3 , Pk2 :k3 ) ◦ ... ◦ (Gkn :j , Pkn :j )
these approaches do not provide a ready set of PPA’s for the
designers to choose from. when i ≥ k1 ≥ k2 ≥ k3 ≥ ... ≥ kn ≥ j.
Among the previous works: [5] proposes Sparse PPA’s The “ ◦ ” operator is defined as:
by considering the radix and lateral fanout at each stage (Gi:k , Pi,k ) ◦ (Gk,j , Pk,j ) =
of the adder; [1] focuses on the the optimal sparseness of
Koge-Stone and Ladner-Fisher Adders to minimize power; [2] (Gi:k + Pi:k · Gk:j , Pi:k · Pk:j )
proposed an alternative sparse Ladner-Fischer Adder structure;
where i ≥ k ≥ j.
[6] evaluated the various parameters (structure, wire length,
3. Carry and Sum computation: the Gi:0 and Pi:0 of all
This work is supported by NSF Grant CNS-1149661
the bits are used to compute Carry Ci and Sum Si in parallel:

Ci = Gi−1:0 , Si = Pi:i ⊕ Ci

2159-3477/17 $31.00 © 2017 IEEE 231


DOI 10.1109/ISVLSI.2017.48
Inputs (G i:i ,P i:i )
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Outputs (G i:0 ,Pi:0)


(a) Kogge−Stone (b) Han−Carlson (c) Ladner−Fischer (d) Ripple−Carry (e) An Arbitrary PPA

Fig. 1: Various Parallel Prefix Adder Structures in stage 2 to compute the Block GP Results

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Segregation
Group−
Sub−Adder 2: 8−bit KSA
Sub−Adder 1: 8−bit KSA

Sub−Adder 1: 8−bit KSA

Sub−Adder 1: 8−bit LFA


Restoration
GP
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

(a) (b) (c)

Fig. 2: a) KSA structure with the Sub-adders shown; b) 2-Sparse KSA: Removing a Sub-Adder and introducing an additional
level of GP computation to compute the final results for the other bits affected by the removal of the Sub-Adder; c)
Replacing the KSA Sub-Adder with Ladner-Fischer (LFA) Sub-Adder to form 2-Sparse LFA

Obviously, the most critical part of a PPA is stage 2 in KSA is known to employ a large amount of hardware (with
terms of area, power and performance. In this crucial stage, the redundancy) to maximize parallelism. This enables the fastest
Block-level Generate and Propagate (Gi:0 , Pi:0 ) is computed computation for block GP results for all the bits.
for every bit i, and each of these computations relies on In fact, two properties make KSA a distinct candidate
the preceding bits from (i − 1) to 0. Therefore the entire for a sparse design. First, the massive hardware in a KSA
circuit consists of a large network of Generate and Propagate is organized in a highly regular fashion - the computations
blocks. Since “ ◦ ” operator is associative, the block-level of even bits and odd bits are entirely segregated and thus
Generate and Propagate computations can be done in arbitrary are carried out independently. Second, due to the prefix-
order. Various PPAs thus use distinct architectures for this computation nature of block GP generation, the result of each
stage. Figure 1 shows this stage 2 structures of various PPA’s, bit can be derived (with a small overhead) from its immediate
including Kogge-Stone (KSA), Han-Carlson (HCA), Ladner- lower neighboring bit. This means that the two groups of even
Fischer (LFA), Ripple-Carry (RCA) and an Arbitrary PPA, and odd bits can be mutually substitutive to each other. From
with each of the circles representing one “ ◦ ” operation, or a these two properties, it is easy to derive a 2-Sparse KSA
GP Block. design, by keeping only one group (the odd bits, for instance)
III. S PARSITY OF A DDERS as Essential Bits and use its results to derive results of the
Non-Essential Bits (the even bits, in this case). The resulting
In general, a k-Sparse adder computes the Block-level GP
2-Sparse KSA is in fact isomorphic to the HCA shown in
results (stage 2 of a PPA) only from every k th bit (denoted as
Figure 1(b).
Essential Bits thereafter), and use their results to derive results
(of stage 2) for the remaining bits (denoted as Non-Essential KSA’s inherent hardware redundancy and regularity is not
Bits). generally presented in other PPA’s. To make a general 2-Sparse
version based on an arbitrary PPA structure, we make the
A. From 2-Sparse KSA to 2-Sparse General PPA following observations by re-organizing the bits of a KSA as
The motivation of a general sparse PPA comes from the is shown in Figure 2(a). Here it becomes clear that, the two
inherent Group Segregation property of KSA’s [10]. A 16-bit groups (odd and even bits, shown in different patterns) each
KSA is shown in Figure 1(a). Comparing to the other designs, by itself forms a structure that is isomorphic to a KSA, albeit

232
half in size (8-bit). In other words, the segregation of groups e e−1 e−2 e−3 e−4 e−5 e−6 e−7 e e−1 e−2 e−3 e−4 e−5 e−6 e−7
in a KSA not only presents the properties of independence and
mutual substitution, but also forms two sub-KSA structures.
A further investigation reveals that these two 8-bit KSA’s
(denoted as Sub-Adders) become completely disjoint after
the first layer of GP-blocks from the input side. Essentially,
the first level of GP operations gathered enough information
for each Sub-Adder, such that each can carry out its own
(a) in log 8 = 3 levels
computations independently.
e e−1 e−2 e−3 e−4 e−5 e−6 e−7
Figure 2(b) then shows that the 2-Sparse version of a KSA
is achieved in three steps: 1) segregating even and odd bits by
the first layer of GP operators; 2) eliminating the computations
of one group (the even bits) as Non-Essential Bits; 3) derive
the final results of the Non-Essential Bits by adding a layer
(c) in 1 level with (b) in 8−1 = 7 levels
of GP-blocks at the output end, from the results of Essential
8−input GP Block with RCA−like structure
Bits.
From the above observations it is then possible to generalize Fig. 3: A variety of Group-Segregation Block structures
a 2-Sparse KSA with other types of Sub-Adders, as is shown
in the example of Figure 2(c). Note that as long as the first
e−1 e−2 e−3 e−4 e−5 e−6 e−7 e−8
layer of GP-blocks (denoted as Group-Segregation) is present,
two independent and mutually substitutable groups (even and
odd bits) are formed. From then on, one group (the odd bits) e−1 e−2 e−3 e−4 e−5 e−6 e−7
can be designated as the Essential Bits, and it can use any form
G(n:0), P(n:0)
of PPA structure as a Sub-Adder. The associated GP operators
(b) after RCA−like Group
for the other group will be eliminated as these even bits now e−8
−Segregation
constitute the Non-Essential Bits. Eventually their results will e−1 e−2 e−3 e−4 e−5 e−6 e−7 e−8
be derived from the Essential Bits via the last layer of GP-
blocks at the output side (denoted as Residual GP Restoration).
Overall, the Group-Segregation in KSA is responsible for
the creation of two mutually exclusive groups. On the other
hand, the Residual GP Restoration for the Non-Essential Bits G(n:0), P(n:0)
is responsible for computing the block GP results from the
Essential Bits. These two building blocks make it possible to (a) after Group−Segregation
in log 8 = 3 levels G(n:0), P(n:0)
design general Sparse PPA’s.
(c) after single level
B. Extension from 2-Sparse PPA to k-Sparse PPA Group−Segregation

For a 2-Sparse design, the Group-Segregation and Residual Fig. 4: Various Residual GP Restoration options
GP Restoration are both straightforward in implementation,
each consisting of one layer of GP blocks. However, extending level (using 2-input GP computation), to k − 1 level (using
from 2 to a k-Sparse design will significantly increase the Ripple-Carry-Adder fashioned sequential computation). Such
overhead in these two building blocks. In fact, this is a choices are shown in Figure 3(a) to (c) for an example of 8-
necessary price to pay, when more than half of the bits are Sparse design. Note that the 1-level design shown in Figure
made Non-Essential. Moreover, a variety of design options can 3(c) essentially employs a high radix [11] GP Block with 8
be explored to realize the functionalities of Group-Segregation inputs, which involves a large overhead internally within each
and Residual GP Restoration. GP operator. In the experimental result section, we choose
1) Group-Segregation designs for k-Sparse PPA’s: The the form in Figure 3(a), where the Group-Segregation always
Group-Segregation stage functions to “bootstrap” the Essential requires log2 k levels.
Bits, such that when they are sent through the Sub-Adder, no 2) Residual GP Restoration designs for k-Sparse PPA’s:
interactions are needed with the Non-Essential Bits. For 2- For each of the Non-Essential Bit n, to achieve the final block
Sparse design, this means every Essential Bit (every 2nd bit) e GP signal of (Gn:0 , Pn:0 ) without a Sub-Adder, two tasks are
needs to compute (Ge:e−1 , Pe:e−1 ) at the Group-Segregation required, both related to the nearest Essential Bit e to the right
stage. For a k-Sparse adder, every k th bit e now needs to of n (assuming n > e). First, the Non-Essential Bit n needs
compute (Ge:e−k+1 , Pe:e−k+1 ), which is identical to a k-bit to compute its local block GP signal of (Gn:e+1 , Pn:e+1 ), and
Block-level GP computation.. this can be done without the results from the Essential Bits.
To form a k-bit block GP computation, design choices range Then, once the results of the essential bit e, (Ge:0 , Pe:0 ) from
from 1-level (using a k-input GP computation), log2 k - the Sub-Adder is available, bit n will integrate the result of

233
Essential Bits Computation Non−Essential Bits Computation m−bit k−Sparse PPA configuration:
Group−Segregation Sub−Adder Residual GP−Restoration
m bits divided into k groups
G(i:i), P(i:i)
k k k k

...
m

m − m/k Non−essential Bits


Group−
1−sparse (default design, all bits are essential) Segregation

m/k Essential Bits


... ...
Compute
Sub−Adder G,P (n:e+1)
for every
m/k bit Non−Essential
)
e:0 (
Bit n
,P
0)
G (e:
2−sparse design: 1/2 of the bits are essential
Residual GP
Restoration
Compue G,P(n:0)

...
G(i:0), P(i:0) for all m bits
Fig. 6: Overall design structure of an m-bit k-Sparse PPA

4−sparse design: 1/4 of the bits are essential


design. The improvements in the number of GP Blocks
required, power and delay can be observed with increased
Fig. 5: Various designs for 32-bit LFA’s with changing sparsity in LFA. On the other hand, longer interconnects are
sparsity needed for LFA’s with higher sparsities. How the parameters
are calculated is discussed in the next section.
Number of Interconnect
Sparsity GP Blocks Length Power Delay
1 1 1 1 1 4) Overall Design: An overall design framework of an m-
2 0.7875 1.0404 0.7527 0.65 bit k-Sparse PPA is shown in Figure 6. The structure starts
4 0.8125 1.0699 0.7793 0.5 with the Group-Segregation stage, which provides the initial
block GP signals for all the m/k Essential Bits (shown
TABLE I: Parameters for the three designs shown in Figure in bold lines), enabling them to carry out their independent
5, normalized to the 1-Sparse design computations later on.
(Ge:0 , Pe:0 ) with its local (Gn:e+1 , Pn:e+1 ) to derive its own Next, all the Essential Bits are channeled to the m/k Sub-
global block GP result of (Gn:0 , Pn:0 ). Adder. This building block will compute (with any choice
Figure 4 shows various structure options for Residual GP of PPA structure) the final block-level GP signals for each
Restoration, corresponding to the Group-Segregation stage Essential Bit e: (Ge:0 ; Pe:0 ).
given in Figure 3. The design choice of the Residual GP
In parallel with the Sub-Adder, the (m − m/k) Non-
Restoration stage should mirror that of the Group Segregation,
Essential Bits (shown in thin lines) are sent to the Residual
so as to maximize the reusability of intermediate results and
GP Restoration building block. Here, the Non-Essential Bits
minimize hardware overhead.
first produce the local Block GP signals, in parallel with the
Overall, similar to Group-Segregation, the Residual GP
computation being carried out at the Sub-Adder. Then, the
Restoration has various structural options with different com-
results from the Sub-Adder are taken in to derive the final
plexities and performance. In the experimental result of this
block GP (Gn:0 , Pn:0 ) for every Non-Essential Bit n.
paper, we select the structure of Figure 4(a), which matches
that of Figure 3(a). With increased sparsity, more computations are needed for
3) Examples: Figure 5 shows three examples of 32-bit the Non-Essential Bits, while the necessitated computations
LFA’s with 1, 2, and 4 Sparse structures. One can easily for the Essential Bits gradually decrease. As a result, when
observe the reduction of fan-outs with increasing sparsity of sparsity increases, the Residual GP Restoration block expands,
the same PPA structure. Table 1 shows the corresponding while the Sub-Adder (with fewer number of input bits) block
parameters, normalized with respect to the default 1-Sparse shrinks.

234
X Y    
7 6 5 4 3 2 1 0 (x1, y1) 
(x4, y4)


    
0
(x2, y2)


1
(x3, y3) 

Logical Interconnects 

2 
(x1, y1) 

(x4, y4)


3 (x2, y2) 



(x3, y3)
   
4 Physical Interconnects:

Half−perimeter of the 
smallest rectangle 

5 that encompasses all
the GP Blocks 

Fig. 7: Representing an arbitrary PPA in a 2D space using 


Cartesian Coordinates, and estimation of interconnect lengths 
via half-perimeter method

IV. A SSOCIATED M ETRICS 
We follow the metrics provided by [3] for estimation of    

area, interconnect, power and performance. The estimations 
given here can be scaled according to any given feature size. 

Area can be represented by two factors: 1) area required 

by the adder given in the coordinate system shown in Figure 

7; 2) number of GP Blocks in the design. 

To estimate the interconnect requirements, we model the 

length of an interconnect to be equal to half of the perimeter 

of the smallest rectangle encompassing the source and the 


   
destinations of the interconnect, as shown in Figure 7. 
Power (P ) has two components: static and dynamic power. 

Static (Ps ) power can be assumed to be fixed for a given
GP Block. Thus, the static power consumed by the adder is 

proportional to the number of GP Blocks (|GP |). On the other 

hand, dynamic (Pd ) power is proportional to the fan-outs (f o) 


of the GP Blocks. Overall, the power can be estimated as: 

P = Ps + Pd = a × f o + b × |GP | 
   

Here a and b are scaling factors, which we assume to be 1 
and 0.5 respectively. 
To estimate the performance in terms of delay (D), we use 
the concept of logical effort. [3] simplifies it as: 

Dl = 1.5 × C + 2.5, Dr = 2 × C + 2.5

Delay is estimated from the critical path in the adder. If 
a GP Block is located along the critical path and the critical 
       
path enters it from right fan-in, then we consider the delay
              
to be Dr . Otherwise when the fan-in comes from the left, the
delay at the GP Block is considered as Dl . Computation of the Fig. 8: Various metrics evaluated on 64-bit Sparse PPA’s
overall delay is performed by considering all the GP Blocks with changing sparsity on various PPA types
along the critical path, and the fan-in paths of the critical path.
In the given equations, C indicates the load capacitance for the sparsity ranging from 1 to 8. The results shown in Figure
corresponding GP Block, which is proportional to its fan-out. 8 indicates the results (area, interconnect, GP block number,
We apply a scaling factor of 0.5 to estimate C = 0.5 × f o. power, delay, and Power-Delay Product) for each type of PPA
when sparsity changes. The simulation was carried out using
V. E XPERIMENTAL R ESULTS the metrics described in Section IV and implemented in a C
A number of PPA structures including KSA, HCA, LFA, program.
BKA (Brent-Kung Adder) and RCA are explored to evalu- For high performance adders, such as KSA, a general trend
ate their sparse variations. We evaluated 64-bit adders with is that delay is the only negative impact of increased sparsity.

235
 Figure 9 shows the design space for the plethora of sparse
  
PPA’s, including KSA, HCA, LFA, BKA and RCA in power-
! delay domain. While KSA and RCA 1-Sparse and 2-Sparse
# 
!
!
 %
designs are having a large variation between them in terms of

$
 power-delay, in most of the cases, the changes can be observed
 

%
! to be rather small. Such small changes have been shown in the
!
$
 zoomed-in plot of Figure 10. Thus sparsity exploration helps in
 %
! “fine-tuning” the adder parameters as is shown in this region.

"$

$
%
VI. C ONCLUSION
! !"!
 We have proposed a general design framework for con-
$% structing Sparse PPA’s of various sparsities and forms. This
 approach is built on the insight derived from the inherent
   " $

property of KSA’s and is generalized on two fronts of the
sparsity and the Sub-Adder structure. The overall scheme
Fig. 9: The 64-bit Adder Design space in power-delay works by: 1) enabling independent computation via a Group-
domain Segregation stage; 2) treating Essential Bits and Non-Essential
Bits differently by only using one copy of the Sub-Adder to
compute for Essential Bits; 3) using the results from the Es-
sential Bits to derive the final results of the Non-Essential Bits.

The proposed framework includes other design flexibilities
regarding how to implement the Group-Segregation and the

Residual GP Restoration building blocks. Simulation results
verify that the proposed design framework indeed opens up a



large design space with various choices along the pareto front,
in terms of performance, area, and power.

R EFERENCES
 [1] M. Aktan, D. Baran, and V. Oklobdzija, “Minimizing Energy by Achiev-
ing Optimal Sparseness in Parallel Adders,” in Computer Arithmetic
 (ARITH), IEEE 22nd Symposium on, 2015, pp. 10–17.
     [2] S. Matthew, M. Anders, R. Krishnamurthy, and S. Borkar, “A 4-GHz
 130-nm Address Generation Unit with 32-bit Sparse-Tree Adder Core,”
IEEE Journal of Solid-State Circuits, vol. 38, no. 5, pp. 689–695, May
2003.
Fig. 10: A zoomed-in view of Figure 9. [3] J. Liu, Y. Zhu, H. Zhu, C. Cheng, and J. Lillis, “Optimum prefix adders
All other metrics show improvement with a more sparse in a comprehensive area, timing and power design space,” in Asia and
South Pacific Design Automation Conference, 2007, pp. 609–615.
design. For HCA, a design of sparsity 3 provides an optimal [4] S. Roy, M. Choudhury, R. Puri, and D. Pan, “Towards optimal
case in power, while sparsity of 2 provides an optimal point for performance-area trade-off in adders by synthesis of parallel prefix
the metric of Power-Delay Product (PDP) of all the choices. structures,” IEEE Transactions on Computer-Aided Design of Intergrated
Circuits and Systems, vol. 33, no. 10, pp. 1517–1530, 2014.
For LFA’s, which have a high fan-out for GP Blocks, in- [5] R. Zlatanovici, S. Kao, and B. Nikolic, “Energy–Delay Optimization of
crease in sparsity does not affect significantly area or intercon- 64-Bit Carry-Lookahead Adders With a 240 ps 90 nm CMOS Design
nect. However, delay decreases consistently with increasing Example,” IEEE Journal of Solid-State Circuits, vol. 44, Issue 2, pp.
569–583, Feb 2009.
sparseness due to the fan-outs decreasing. For power, we can [6] B. Zeydel, D. Baran, and V. Oklobdzija, “Energy-Efficient Design
observe an optimum point at sparsity of 3. PDP also has a Methodologies: High-Performance VLSI Adders,” IEEE Journal of
downward trend, but the rate of reduction goes down as the Solid-State Circuits, vol. 45, no. 6, pp. 1220–1233, June 2010.
[7] J. Sklansky, “Conditional-Sum Addition Logic,” IRE Transactions on
sparsity increases. Electronic Computers, vol. EC-9, no. 2, pp. 226–231, June 1960.
For BKA’s, the power and PDP is lower at the typical 1- [8] O. Bedrij, “Carry-Select Adder,” IRE Transactions on Electronic Com-
Sparse over other higher sparsity designs. Nevertheless, we can puters, vol. EC-11, no. 0, pp. 340–346, June 1962.
[9] B. Parhami, Computer Arithmetic Algorithms and Hardware Designs,
observe the presence of the fastest possible BKA at sparsity 2nd ed. Oxford University Press, New York, 2010.
6. [10] S. Banerjee and W. Rao, “A general approach for highly defect tolerant
parallel prefix adder design,” in Design, Automation and Test in Europe
Finally, for RCA, while area, power, number of GP Blocks Conference and Exhibition, 2016, pp. 666–671.
and interconnect length increases, there is a big performance [11] F. Gurkayna, Y. Leblebicit, L. Chaouati, and P. McGuiness, “Higher
improvement with increased sparsity. For instance, 2-Sparse radix kogge-stone parallel prefix adder architectures,” IEEE Interna-
tional Symposium on Circuits and Systems, vol. 5, pp. 609–612, 2000.
RCA is almost twice as fast as a default 1-Sparse RCA.
Furthermore, PDP also has a decreasing trend with increasing
sparsity, indicating the presence of an exploitable power-delay
trade-off.

236

You might also like