1 s2.0 S0141933119305976 Main

Microprocessors and Microsystems 74 (2020) 103009
Contents lists available at ScienceDirect
Microprocessors and Microsystems

journal homepage: www.elsevier.com/locate/micpro
Delay and area efficient approximate multiplier using reverse carry

propagate full adder
V.J. Arulkarthick1,∗, Abinaya Rathinaswamy2
Department of ECE, JCT college of Engineering and Technology, Tamil Nadu, India
a r t i c l e i n f o a b s t r a c t
Article history: The performance and power of error resilient applications will rise with a decrease in designing com-
Received 7 November 2019 plexness due to approximate computing. This paper includes the new method for the approximation of
Revised 3 January 2020
multipliers. Variable likelihood terms are produced by the alteration of partial products of the multiplier.
Accepted 28 January 2020
Based on the probability statistics, the accumulation of altered partial products leads to the variation
Available online 29 January 2020
of logic complexity. Here the estimate is implemented in 2 variables of 16-bit multiplier and in the fi-
Keywords: nal stage with reverse carry propagate adder(RCPA). The reverse carry propagate adder have carry signal
Microprocessor propagation from the most significant bit(MSB) to the least significant bit(LSB), which results in greater
Communication relevance to the input carry than the output carry. The technique of carry circulation in reverse order
Microsystems with delay variations increases the stability. Utilizing the RCPA in approximate multiplier provide 21%
and 7% improvements in area and delay. On comparing, this structure is resilient to delay variations than
the ideal approximate adder.
© 2020 Published by Elsevier B.V.
1. Introduction to build the multiplier. Second the main structure i.e., multiplier
tree or array form and third is the arrangement of approximated
Multipliers are the largest part in signal processing applications sub-blocks in the main multiplier. Circuit level implementation of
.It mainly focused on the high-speed, low area and low power. multiplier with approximated output based on the above factor
These parameters are achieved by approximate multipliers. In er- is analyzed for design space. Implementations at the circuit level
ror tolerant applications like image processing, power consumption used here are variety of the foremost widely used approximate
is decreased by approximate computing as an increasing approach. full adders. Approximate compressor circuit at higher order for the
Multipliers with approximated output are employed in applications construction of 8 × 8 multiplier is developed from these full adder
as essential tolerance to quality for energy efficient computation. blocks. The main objective is to design approximate multiplier with
Nevertheless, the identification of foremost applicable approximate efficient delay and area. The designs of a digital processing unit for
multiplier with accuracy as the main element besides power, area, compact systems include contraction and speed improvements as
and performance is quite troublesome. This paper deals with the key goals. Generally, accurate processing units with more power
4 bit adder. So the output may vary from 0 0 0 0 to 1111 with carry consumption increase the speed. Deduction of accuracy is the one
out 0 and 1 depending upon the input. The existing system of this way to increase power and speed. In [1] the need to strengthen
paper deals with the delay of 15.582 ns and the delay of proposed energy-constrained devices in numerous digital signal processing
system is 14.049 ns. So there is a decrease in delay which increases and classification application increases steadily. Fixed point arith-
the speed of operation leads to the improvement of efficiency. The metic is used to perform matrix multiplication in the above ap-
end product is a delay efficient binary coded decimal adder from plications results in a few errors in measurement. Therefore the
which we can obtain a coded input for decimal input. energy efficiency of multiplication increases. Here [2] proposed a
The selection of approximate multiplier is based on the fol- multiplier architecture that establishes energy consumption at de-
lowing 3 factors. First, the approximate full adder type employed sign time in accuracy measurement. The modified multiplier con-
sume energy less than 58% with a computational error of approxi-
∗
mately 1% then exact multiplier.DSP application and the classifica-
Corresponding author.
tion will not be affected in their quantity and accuracy by a small
E-mail addresses: karthick.arul@gmail.com (V.J. Arulkarthick),
abinayarathinaswamy@gmail.com (A. Rathinaswamy). error in summation. In [3], a modified booth multiplier design with
1
Dr. V.J. Arulkarthick, Assistant Professor (Selection Grade). fixed width method is suggested. Outputs of booth encoder are
2
Abinaya Rathinaswamy, Assistant Professor. used to generate intolerance compensation to reduce the hardware
https://doi.org/10.1016/j.micpro.2020.103009
0141-9331/© 2020 Published by Elsevier B.V.
2 V.J. Arulkarthick and A. Rathinaswamy / Microprocessors and Microsystems 74 (2020) 103009
Fig. 3. Block diagram of the RCPFA.

Fig. 1. Transformed partial products.
complication and to compensate quantization error. Quantization

error effects of truncated bits are focused by dividing it into 2
parts. Completely different strategies for error compensation are
employed to each part. The error compensation strategy used here
to modify the booth fixed width multiplier by receiving and pro-
ducing a bit significant. Based on the effects of truncated bits in
quantization error, they are split up into 2 parts. Then the differ-
ent methods are applied to compensate for the error. On compari-
son, this new compensation methodology reduces quantization er-
ror up to five hundred than the present technique with hardware
overhead in the generation circuit. And the proposed methodology
also reduces area and power consumption by 35% than the perfect
multiplier design.
Fig. 4. n-bit RCPA.

2. Existing system
Precise computing units are not continuously necessary in mul- a tree, at last, the final product is provided by the sum and carry
timedia signal processing and data processing applications where developed from the reduction tree by vector addition.
the error may be tolerated. That computing units can be replaced The second stage absorbs extra power. Because approxima-
by their approximate parts. Analysis for error-tolerant applications tion is employed in a reduction tree. An 8-bit unsigned mul-
on approximate measurement is increasing day by day. The major tiplier is employed as an example to explain the technique in
parts of these applications are set up by the adders and multipli- the approximation of multipliers [4]. Think about two 8-bit in-
ers. Implementation of imprecise multiplier includes the following puts α = 7m = 0, α m 2m , and β = 7n = 0, β n 2n .The partial product
stages. Partial product generation, reduction of partial products in am,n = αm .βn . In Fig. 3.1 is the results of AND operation between
Fig. 2. Ripple carry with BCD.

V.J. Arulkarthick and A. Rathinaswamy / Microprocessors and Microsystems 74 (2020) 103009 3
generate elements in a column gives accurate result in most cases

[6]. Misprediction for the probability is very less. The probability
of error increases when there is an increase in generate signal. And
the error value rises linearly. This can be prevented by keeping the
OR gate grouping to generate signals by 4.
2.2. Approximation of other partial products
Approximate units are made by the aggregation of other partial

products. Aggregation is set up by the half adder, full adder and
4-2 compressor with approximate output. These circuits have sum
and carry as their outputs. Error difference of 2 in output will be
produced by carry since it has a greater binary weight. The exact
difference between actual and approximated output is kept at one
to maintain the approximation [7]. Therefore approximation is ap-
plied to the carry only when the sum is approximated. XOR gate in
compressor and adder circuits provide greater area and delay.XOR
gate in the sum expression of half adder is interchanged by OR
gate as in Eqs. (3.3) and (3.4). This leads to one error in sum and
outputs.
Sum = X1 + X2 (3.3)
Fig. 5. Kmap for sum and carry of ideal RCPFA. Carry = X1.X2 (3.4)
Full adder is approximated by replacing one of the 2 XOR gates
the bits of α m and β n . If there is a column with partial products by OR gate in sum expression. Due to this 2 outputs result in error
a out of eight outputs. Carry introduces one error after modification
more than 3 then the products am, n and n,m are merged to make
am,n as in Eq. (3.7). For keeping one as the difference between abso-
propagate and generate signals [5]. lute and approximate outputs, this modification contributes a lot
The transformed partial products pm, n and gm, n are generated of simplification.
from propagate and generate signals. Transformed partial prod-
W = (X1 + X2 ) (3.5)
ucts pm,n and gm,n is the result of changes in the partial products
sum =(X1 X2 ) + (X3 X4 ) + W1.W2 and an, m from column 3 to Sum = W X3 (3.6)
11 with weights 23 and 211 .
Carry = W.X3 (3.7)
pm,n = am,n + an,m (3.1)
When all the inputs are one then only 3 bits are needed to pro-
g m,n = am,n .an,m (3.2) vide the output in 4-2 compressor that happens for one time out
of 16. One in the 3 outputs bits is eliminated by this property for
2.1. Approximation of altered partial products gm,n the 4-2 compressor. The output “100” for 4 inputs is replaced as
output “11” for maintaining the error difference as one. One of the
Generate signals are arranged in columns. The probability of three XOR gate in the sum expression is renewed by OR gate. An
each element having one is 1/16 and two elements in a column additional circuit X1.X2.X3.X4 is included in the sum calculation to
having one is reduced by OR gate approximation in the tree for the produce the output one when all the inputs are one [8]. Five errors
Fig. 6. The internal structure and Truth table of RCPFA-I.

Fig. 7. The internal structure and Truth table of RCPFA-II.
Fig. 8. The internal structure and Truth table of RCPFA-III.
are produced out of 16 outputs. The simplified expression of carry caused by timing violation reduces if the carry propagation is in
is shown in (3.11). reverse order. This reason encouraged the use of approximate full
adders [10].
W1 = X1.X2 (3.8)
W2 = X3.X4 (3.9) 3. Reverse carry propagate full-adder cell
sum =(X1 X2 ) + (X3 X4 ) + W1.W2 (3.10) 3.1. The sum and carry of each literal full adder are developed by the
below equation
Carry = W1 + W2 (3.11)
The accuracy of ripple carry adder and BCD (Fig. 2) is deduced
to enhance speed and energy efficiency. Carry propagate adders 2Ci+1 + Si = Ai + Bi + Ci (4.1)
has standard FA as their basic blocks with three inputs of same
weights. The sum expression of FA has 2 outputs with the same Where Ai and Bi are the inputs of the ith bit corresponding to
weight corresponding to inputs with the same weight and carry A and B, Ci , Ci+1 are the input and output carry and Si is the sum
output with double weight. In multistage adders, the critical path of the ith bit. Depending on the above equation sum and carry of
delay is evaluated from the propagation delay of the carry in each the ith stage is related to the inputs A, B of the ith bit position and
FA unit. In the least cases the n x tcp determines the carry propaga- the carry out from the previous stage. Eq. (4.1) can be rearranged
tion delay of the adder and n is the number of stages in the adder. by moving Ci and Ci+1 to their opposite sides.
Therefore clock cycle with smaller n x tcp cause potential error by
Si −Ci = Ai + Bi −2Ci+1 (4.2)
the violation in setup time. A small delay variation in the MSB of
sum expression will cause an error in a larger amount [9]. This is Considering the above equation, the working principle of RCPA
due to the carry input of MSB propagate and generated through architecture depends on the input of the current stage and the
FA’s with low significant bits. Depending upon this fact, the error carry output of the next stage. In this structure, the outputs sum
Fig. 9. Area analysis of proposed multiplier.
Fig. 10. Area analysis of existing multiplier.
and carry have the same significance. The input carry Ci+1 to the considered for Si and Ci , when the RHS of (4.2) is zero. One solu-
current ith stage is set up by the (i+1)th stage FA. The selection tion to select one of the two answers is to use the auxiliary signal
of exact output corresponding to the input is from the set {−2,−1, developed by the inputs of (i-1)th stage. Based on the above details
0, 1, 2}. But while considering the significance of the output, the the full adder structure for RCPFA is shown in Fig. 4.1. The RCPFA
selection of output can be done only from {−1, 0, 1} set, leads to has 4 inputs and 3 outputs.
improper output. Especially, when the RHS of (4.2) is −2 or 2 then The inputs to this full adder are the Ai , Bi, forecast signal Fi ,
the output becomes unreliable. And either (0, 0) or (1, 1) may be and the carry out from the next stage Ci+1 . The sum Si , carry Ci
Fig. 11. Delay analysis of existing multiplier.
Fig. 12. Delay analysis of proposed multiplier.
and forecast signal Fi-1 are termed to be its output signals. When the LSB stage is assumed to be equal to the C0 of the n-bit RCPA,
the RHS of (4.2) is zero then the Fi signal is used to select one because there is no preceding stage for the zeroth stage. The an-
value from 2 pairs. The n-bit RCPA structure is shown in Fig. 4.2. alytical flow of operation is shown in Fig. 4.2. Incomplete carry
In this n stage RCPA, the carry input Cn for MSB stage is ac- propagation causes some errors in RCPA as in RCA and in addition
cepted to be equal to the output Fn of that stage. Due to this some to intrinsic error. The main advantage of RCPA is the reduction in
inexact result may be generated in the approximate adder.F0 for error as the bit significance increases. That is the error due to delay
Fig. 13. Power analysis of proposed multiplier.
Fig. 14. Output Waveform.

variation during carry propagation is lower for the most significant power and area with minimal output loss use this proposed mul-
bit. tiplier design.FIR filter and evaluation of accuracy and efficiency in
DCT by JPEG compression are the DSP applications of this multi-
4. Internal structure of RCPFA plier.
Future scope of the paper includes the multiplier and RCPFA
The structure of RCPFA is implemented from the Kmap result can be implemented using pass transistor logic and transmission
of sum Si and carry Ci generated from the Eq. (4.2) and with input gate. This leads to fewer transistor counts, power consumption and
forecast signal. achieve highest reliability and speed.
The Si and Ci signals are generated from the Boolean relation
between inputs. 7. Ethical approval
Si = Ci+1 Fi + Ci+1 Ai + Ci+1 Bi + Ai Bi Fi (4.3)
This article does not contain any studies with human partici-
Ci = Ci+1 Fi + Ci+1 Ai + Ci+1 Bi + Ai Bi Fi (4.4) pants or animals performed by any of the authors.
The (4.3) and (4.4) equations are further reduced to RCPFA

Declaration of Competing Interest
structure at the gate level.
Si = Fi (Ci+1 + Ai Bi ) + Ci+1 (Ai + Bi ) = Fi Xi + Yi (4.5) This paper has not communicated anywhere till this moment,
now only it is communicated to your esteemed journal for the
Ci = Fi (Ci+1 (Ai + Bi ) ) + (Ci+1 + Ai Bi ) = Fi Yi + Xi (4.6) publication with the knowledge of all co-authors.
Performance and accuracy of this adder is based on the forecast
References
signal F with some overheads. This means the general RCPFA struc-
ture can be simplified by optimizing the forecast signal. Here the [1] S. Narayanamoorthy, H.A. Moghaddam, Z. Liu, T Park, N. S.Kim, Energy effi-
forecast signal can be generated in 3 methods. The truth table and cient approximate multiplication for digital signal processing and classifica-
the enhanced structure of RCPFA using gates are shown in Fig. 4.4– tion applications’, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 23 (6) (2015)
1180–1184.
4.6. The first general form of RCPFA is retrieved from Eqs. (4.5) and [2] C. Liu, J. Han, F. Lombardi, A low-power, high-performance approximate mul-
(4.6). tiplier with configurable partial error recovery, Proc. Conf. Exhibit. (2014) 1–4
The signal F is the carry generate signal (Ai AND Bi ) in the sec- DATE.
[3] C. H. Lin, C Lin, High accuracy approximate multiplier with error correction,
ond structure and in third structure, F is the carry alive signal (Ai
in: Proc. IEEE 31st Int. Conf. Comput. Design, 2013, pp. 33–38.
OR Bi ). [4] K.-J. Cho, K.-C. Lee, J.-G. Chung, K.K. Parhi, Design of low-error fixed width
The states in the truth table with Xi = 1 does not appear when modified booth multiplier, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. vol.
the carry alive signal is used as an F signal. So the Xi is replaced 12 (5) (2004) 522–531.
[5] J. Han Momeni, P Montuschi, F Lombardi, Design and analysis of approximate
with zero to produce a further simplified structure in Fig. 4.5. In compressors for multiplication, IEEE Trans. Comput. 64 (4) (2015) 984–994.
the same way, Fig. 4.6 is implemented by selecting generate signal [6] V. Gupta, D Mohapatra, A Raghunathan, K Roy, Low-power digital signal pro-
as forecast signal and Yi is replaced with 1. cessing using approximate adders, IEEE Trans. Comput. Aided Des. Integr. Cir-
cuits Syst. 32 (1) (2013) 124–137.
First RCPFA structure used 26 transistors that is less than stan- [7] N. Zhu, W.L. Goh, W. Zhang, K.S. Yeo, Z.H. Kong, ‘Design of low-power high-
dard FA. Simplification of RCPFA-1 generate II and III structure with -speed truncation-error-tolerant adder and its application in digital signal
16 transistors and 10 transistors less than previous one and fore- processing’, IEEE Trans, Very Large Scale Integr. (VLSI) Syst. 18 (8) (2010)
1225–1229.
cast signal uses 4 transistors. [8] Z. Yang, A. Jain, J. Liang, J. Han, F. Lombardi, Approximate XOR/XNOR-based
adders for inexact computing, in: Proc. 13th IEEE Int. Conf. Nanotechnol.
5. Result and discussion (NANO), 2013, pp. 690–693.
[9] G. Zervakis, K. Tsoumanis, S. Xydis, D. Soudris, K. Pekmestzi, Design-efficient
approximate multiplication circuits through partial product perforation, IEEE
The synthesized result is analysed for Approximate multiplier Trans. Very Large Scale Integr. (VLSI) Syst. 24 (10) (2016) 3105–3117.
with reverse carry propagate full adder and ripple carry adder in [10] J. Han J. Liang, F. Lombardi, ‘New metrics for the reliability of approximate and
terms of area,delay and power in Xilinx. probabilistic adders’, IEEE Trans. Comput. 63 (9) (2013) 1760–1771.
While comparing the power, area and delay of the proposed Arulkarthick V J, Ph.D, received his B.E. in Electrical
multiplier with existing multiplier using ripple carry adder, the and Electronics Engineering from Bharathiyar University,
power is almost same in both the multiplier. But the area and de- Coimbatore; ME in Applied Electronics and Ph.D in Infor-
mation and Communication Engineering from Anna Uni-
lay overhead are reduced in the proposed multiplier. versity, Chennai, His research interests are Signal and Im-
age processing, VLSI Design, and VLSI signal processing.
6. Conclusion He is currently working as Assistant Professor (Selection
Grade) at JCT college of Engineering, Coimbatore. He is a
member of IEEE, ISTE.
In this paper, the generate and propagate signals are used to
modify the multiplier’s partial products for the implementation of
an efficient approximate multiplier. Altered partial products are
approximated by a simple OR gate. Approximated full adder, half
Abinaya Rathinaswamy received her M.E. degree in VLSI
adder,4-2 compressor and approximate RCPFA’s are proposed to re- Design and B.E. degree in Electronics and Communication
duce the remaining partial products. The propagation of carry in Engineering from Anna University, Chennai. Her current
reverse adder from MSB to LSB achieves greater stability in de- research interests include VLSI Design and Signal Process-
ing. She is currently working as Assistant Professor at JCT
lay.. In this approximate multiplier, approximation is applied only college of Engineering, Coimbatore.
in n−1 least significant part in multiplier. This proposed multiplier
architecture achieves a considerable reduction in delay and area
utilized than the standard one.
Utilizing the approximate RCPFA in the approximate multiplier
gives 21% and 7% improvements in area and delay. Compared with
the present approximate multiplier designs, the proposed multi-
plier has better accuracy. Applications with significant savings in

1 s2.0 S0141933119305976 Main

Uploaded by

Copyright:

Available Formats

1 s2.0 S0141933119305976 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0141933119305976 Main

Uploaded by

Copyright:

Available Formats

Microprocessors and Microsystems 74 (2020) 103009

Contents lists available at ScienceDirect

Microprocessors and Microsystems

Delay and area eﬃcient approximate multiplier using reverse carry

Fig. 3. Block diagram of the RCPFA.

complication and to compensate quantization error. Quantization

Fig. 4. n-bit RCPA.

Fig. 2. Ripple carry with BCD.

generate elements in a column gives accurate result in most cases

2.2. Approximation of other partial products

Approximate units are made by the aggregation of other partial

Fig. 6. The internal structure and Truth table of RCPFA-I.

Fig. 7. The internal structure and Truth table of RCPFA-II.

Fig. 8. The internal structure and Truth table of RCPFA-III.

W2 = X3.X4 (3.9) 3. Reverse carry propagate full-adder cell

Fig. 9. Area analysis of proposed multiplier.

Fig. 10. Area analysis of existing multiplier.

Fig. 11. Delay analysis of existing multiplier.

Fig. 12. Delay analysis of proposed multiplier.

Fig. 13. Power analysis of proposed multiplier.

Fig. 14. Output Waveform.

The (4.3) and (4.4) equations are further reduced to RCPFA

You might also like