1 s2.0 S0141933119305976 Main
1 s2.0 S0141933119305976 Main
1 s2.0 S0141933119305976 Main
a r t i c l e i n f o a b s t r a c t
Article history: The performance and power of error resilient applications will rise with a decrease in designing com-
Received 7 November 2019 plexness due to approximate computing. This paper includes the new method for the approximation of
Revised 3 January 2020
multipliers. Variable likelihood terms are produced by the alteration of partial products of the multiplier.
Accepted 28 January 2020
Based on the probability statistics, the accumulation of altered partial products leads to the variation
Available online 29 January 2020
of logic complexity. Here the estimate is implemented in 2 variables of 16-bit multiplier and in the fi-
Keywords: nal stage with reverse carry propagate adder(RCPA). The reverse carry propagate adder have carry signal
Microprocessor propagation from the most significant bit(MSB) to the least significant bit(LSB), which results in greater
Communication relevance to the input carry than the output carry. The technique of carry circulation in reverse order
Microsystems with delay variations increases the stability. Utilizing the RCPA in approximate multiplier provide 21%
and 7% improvements in area and delay. On comparing, this structure is resilient to delay variations than
the ideal approximate adder.
© 2020 Published by Elsevier B.V.
1. Introduction to build the multiplier. Second the main structure i.e., multiplier
tree or array form and third is the arrangement of approximated
Multipliers are the largest part in signal processing applications sub-blocks in the main multiplier. Circuit level implementation of
.It mainly focused on the high-speed, low area and low power. multiplier with approximated output based on the above factor
These parameters are achieved by approximate multipliers. In er- is analyzed for design space. Implementations at the circuit level
ror tolerant applications like image processing, power consumption used here are variety of the foremost widely used approximate
is decreased by approximate computing as an increasing approach. full adders. Approximate compressor circuit at higher order for the
Multipliers with approximated output are employed in applications construction of 8 × 8 multiplier is developed from these full adder
as essential tolerance to quality for energy efficient computation. blocks. The main objective is to design approximate multiplier with
Nevertheless, the identification of foremost applicable approximate efficient delay and area. The designs of a digital processing unit for
multiplier with accuracy as the main element besides power, area, compact systems include contraction and speed improvements as
and performance is quite troublesome. This paper deals with the key goals. Generally, accurate processing units with more power
4 bit adder. So the output may vary from 0 0 0 0 to 1111 with carry consumption increase the speed. Deduction of accuracy is the one
out 0 and 1 depending upon the input. The existing system of this way to increase power and speed. In [1] the need to strengthen
paper deals with the delay of 15.582 ns and the delay of proposed energy-constrained devices in numerous digital signal processing
system is 14.049 ns. So there is a decrease in delay which increases and classification application increases steadily. Fixed point arith-
the speed of operation leads to the improvement of efficiency. The metic is used to perform matrix multiplication in the above ap-
end product is a delay efficient binary coded decimal adder from plications results in a few errors in measurement. Therefore the
which we can obtain a coded input for decimal input. energy efficiency of multiplication increases. Here [2] proposed a
The selection of approximate multiplier is based on the fol- multiplier architecture that establishes energy consumption at de-
lowing 3 factors. First, the approximate full adder type employed sign time in accuracy measurement. The modified multiplier con-
sume energy less than 58% with a computational error of approxi-
∗
mately 1% then exact multiplier.DSP application and the classifica-
Corresponding author.
tion will not be affected in their quantity and accuracy by a small
E-mail addresses: karthick.arul@gmail.com (V.J. Arulkarthick),
abinayarathinaswamy@gmail.com (A. Rathinaswamy). error in summation. In [3], a modified booth multiplier design with
1
Dr. V.J. Arulkarthick, Assistant Professor (Selection Grade). fixed width method is suggested. Outputs of booth encoder are
2
Abinaya Rathinaswamy, Assistant Professor. used to generate intolerance compensation to reduce the hardware
https://doi.org/10.1016/j.micpro.2020.103009
0141-9331/© 2020 Published by Elsevier B.V.
2 V.J. Arulkarthick and A. Rathinaswamy / Microprocessors and Microsystems 74 (2020) 103009
Precise computing units are not continuously necessary in mul- a tree, at last, the final product is provided by the sum and carry
timedia signal processing and data processing applications where developed from the reduction tree by vector addition.
the error may be tolerated. That computing units can be replaced The second stage absorbs extra power. Because approxima-
by their approximate parts. Analysis for error-tolerant applications tion is employed in a reduction tree. An 8-bit unsigned mul-
on approximate measurement is increasing day by day. The major tiplier is employed as an example to explain the technique in
parts of these applications are set up by the adders and multipli- the approximation of multipliers [4]. Think about two 8-bit in-
ers. Implementation of imprecise multiplier includes the following puts α = 7m = 0, α m 2m , and β = 7n = 0, β n 2n .The partial product
stages. Partial product generation, reduction of partial products in am,n = αm .βn . In Fig. 3.1 is the results of AND operation between
are produced out of 16 outputs. The simplified expression of carry caused by timing violation reduces if the carry propagation is in
is shown in (3.11). reverse order. This reason encouraged the use of approximate full
adders [10].
W1 = X1.X2 (3.8)
sum =(X1 X2 ) + (X3 X4 ) + W1.W2 (3.10) 3.1. The sum and carry of each literal full adder are developed by the
below equation
Carry = W1 + W2 (3.11)
The accuracy of ripple carry adder and BCD (Fig. 2) is deduced
to enhance speed and energy efficiency. Carry propagate adders 2Ci+1 + Si = Ai + Bi + Ci (4.1)
has standard FA as their basic blocks with three inputs of same
weights. The sum expression of FA has 2 outputs with the same Where Ai and Bi are the inputs of the ith bit corresponding to
weight corresponding to inputs with the same weight and carry A and B, Ci , Ci+1 are the input and output carry and Si is the sum
output with double weight. In multistage adders, the critical path of the ith bit. Depending on the above equation sum and carry of
delay is evaluated from the propagation delay of the carry in each the ith stage is related to the inputs A, B of the ith bit position and
FA unit. In the least cases the n x tcp determines the carry propaga- the carry out from the previous stage. Eq. (4.1) can be rearranged
tion delay of the adder and n is the number of stages in the adder. by moving Ci and Ci+1 to their opposite sides.
Therefore clock cycle with smaller n x tcp cause potential error by
Si −Ci = Ai + Bi −2Ci+1 (4.2)
the violation in setup time. A small delay variation in the MSB of
sum expression will cause an error in a larger amount [9]. This is Considering the above equation, the working principle of RCPA
due to the carry input of MSB propagate and generated through architecture depends on the input of the current stage and the
FA’s with low significant bits. Depending upon this fact, the error carry output of the next stage. In this structure, the outputs sum
V.J. Arulkarthick and A. Rathinaswamy / Microprocessors and Microsystems 74 (2020) 103009 5
and carry have the same significance. The input carry Ci+1 to the considered for Si and Ci , when the RHS of (4.2) is zero. One solu-
current ith stage is set up by the (i+1)th stage FA. The selection tion to select one of the two answers is to use the auxiliary signal
of exact output corresponding to the input is from the set {−2,−1, developed by the inputs of (i-1)th stage. Based on the above details
0, 1, 2}. But while considering the significance of the output, the the full adder structure for RCPFA is shown in Fig. 4.1. The RCPFA
selection of output can be done only from {−1, 0, 1} set, leads to has 4 inputs and 3 outputs.
improper output. Especially, when the RHS of (4.2) is −2 or 2 then The inputs to this full adder are the Ai , Bi, forecast signal Fi ,
the output becomes unreliable. And either (0, 0) or (1, 1) may be and the carry out from the next stage Ci+1 . The sum Si , carry Ci
6 V.J. Arulkarthick and A. Rathinaswamy / Microprocessors and Microsystems 74 (2020) 103009
and forecast signal Fi-1 are termed to be its output signals. When the LSB stage is assumed to be equal to the C0 of the n-bit RCPA,
the RHS of (4.2) is zero then the Fi signal is used to select one because there is no preceding stage for the zeroth stage. The an-
value from 2 pairs. The n-bit RCPA structure is shown in Fig. 4.2. alytical flow of operation is shown in Fig. 4.2. Incomplete carry
In this n stage RCPA, the carry input Cn for MSB stage is ac- propagation causes some errors in RCPA as in RCA and in addition
cepted to be equal to the output Fn of that stage. Due to this some to intrinsic error. The main advantage of RCPA is the reduction in
inexact result may be generated in the approximate adder.F0 for error as the bit significance increases. That is the error due to delay
V.J. Arulkarthick and A. Rathinaswamy / Microprocessors and Microsystems 74 (2020) 103009 7
variation during carry propagation is lower for the most significant power and area with minimal output loss use this proposed mul-
bit. tiplier design.FIR filter and evaluation of accuracy and efficiency in
DCT by JPEG compression are the DSP applications of this multi-
4. Internal structure of RCPFA plier.
Future scope of the paper includes the multiplier and RCPFA
The structure of RCPFA is implemented from the Kmap result can be implemented using pass transistor logic and transmission
of sum Si and carry Ci generated from the Eq. (4.2) and with input gate. This leads to fewer transistor counts, power consumption and
forecast signal. achieve highest reliability and speed.
The Si and Ci signals are generated from the Boolean relation
between inputs. 7. Ethical approval
Si = Ci+1 Fi + Ci+1 Ai + Ci+1 Bi + Ai Bi Fi (4.3)
This article does not contain any studies with human partici-
Ci = Ci+1 Fi + Ci+1 Ai + Ci+1 Bi + Ai Bi Fi (4.4) pants or animals performed by any of the authors.
While comparing the power, area and delay of the proposed Arulkarthick V J, Ph.D, received his B.E. in Electrical
multiplier with existing multiplier using ripple carry adder, the and Electronics Engineering from Bharathiyar University,
power is almost same in both the multiplier. But the area and de- Coimbatore; ME in Applied Electronics and Ph.D in Infor-
mation and Communication Engineering from Anna Uni-
lay overhead are reduced in the proposed multiplier. versity, Chennai, His research interests are Signal and Im-
age processing, VLSI Design, and VLSI signal processing.
6. Conclusion He is currently working as Assistant Professor (Selection
Grade) at JCT college of Engineering, Coimbatore. He is a
member of IEEE, ISTE.
In this paper, the generate and propagate signals are used to
modify the multiplier’s partial products for the implementation of
an efficient approximate multiplier. Altered partial products are
approximated by a simple OR gate. Approximated full adder, half
Abinaya Rathinaswamy received her M.E. degree in VLSI
adder,4-2 compressor and approximate RCPFA’s are proposed to re- Design and B.E. degree in Electronics and Communication
duce the remaining partial products. The propagation of carry in Engineering from Anna University, Chennai. Her current
reverse adder from MSB to LSB achieves greater stability in de- research interests include VLSI Design and Signal Process-
ing. She is currently working as Assistant Professor at JCT
lay.. In this approximate multiplier, approximation is applied only college of Engineering, Coimbatore.
in n−1 least significant part in multiplier. This proposed multiplier
architecture achieves a considerable reduction in delay and area
utilized than the standard one.
Utilizing the approximate RCPFA in the approximate multiplier
gives 21% and 7% improvements in area and delay. Compared with
the present approximate multiplier designs, the proposed multi-
plier has better accuracy. Applications with significant savings in