1 s2.0 S0141933121004920 Main

Microprocessors and Microsystems 86 (2021) 104333
Contents lists available at ScienceDirect
Microprocessors and Microsystems

journal homepage: www.elsevier.com/locate/micpro
Power and delay efficient fir filter design using ESSA and VL-CSKA based
booth multiplier
Aditya Mandloi a, *, Santosh Pawar b
a
Department of Electronics Engineering, Medi-Caps University, Pigdamber, Rau, Indore, Madhya Pradesh 453331, India
b
Department of Electronics and Communication Engineering, Dr. A. P. J. Abdul Kalam University, Indore, Madhya Pradesh, India
A R T I C L E I N F O A B S T R A C T
Keywords: FIR filter plays a major role in digital image processing applications. The power and delay performance of any
FIR filter FIR filter depends on the switching activities between the filter coefficients (FCs) and its basic arithmetic op
Optimum FC erations (i.e., multiplication and addition) performed in the convolution equations. In this paper, a new FIR filter
Enhanced squirrel search optimization (ESSA)
is designed using Enhanced Squirrel Search Algorithm (ESSA) and Variable latency Carry skip adder (VL-CSKA)
Modified booth multiplier
Variable latency carry skip adder (VL-CSKA)
based booth multiplier. The proposed ESSA algorithm selects an optimal FC by minimizing the switching ac
tivities of FC based on the ripple contents, power and Transition width parameter to meet the required speci
fications of FIR filter in the frequency domain. Also, the VL-CSKA based booth multiplier is proposed to reduce
the delay of FIR filter with parallel addition of partial products (PPs). In this design, the VL-CSKA adders utilize
variable size and compound gate-based skip logic to deduce the delay with low power. The proposed FIR filter is
simulated in Xilinx working platform by developing Verilog coding. The simulation result shows that the pro
posed FIR filter outperforms the state-of-the-art FIR filters by consuming only 0.142 mW power with delay of
28.175 ns.
1. Introduction filter. Another author Aggarwal et al. [7] used the two-dimensional (2D)
L1 technique to design the 2D FIR filter. In this article, quadrant sym
Digital filter plays an important role in Digital Signal Processing metry concept is utilized to reduce the FCs. Fractional Derivative
(DSP) applications. There are two types of digital filters namely, Finite Constraint (FDC) technique has been developed by Aggarwal et al. in [8]
Impulse Response (FIR) and Infinite Impulse Response (IIR) filters. The to design the 2D FIR filter. Also, hybrid particle swarm optimization and
stability and linear phase characteristics of FIR filter engaged the re gravitational search algorithm (HPSO-GSA) has been used by this author
searchers to use FIR filters broadly in DSP applications. Because, it has to calculate the optimized FIR coefficients. Type-III and type-IV discrete
no feedback and linear phase response. The hardware usage of the entire cosine transform (DCT) was utilized to design fractional order differ
FIR structures can be evaluated by implementing them on FPGA. The entiator in FIR filter [9]. Moreover, power function and least-square was
high-performance filter should consume less power and increase speed used to design a fractional order digital differentiator [10]. Optimal
to fabricate it on DSP platforms [1–4]. The FIR filter does not require any coefficient selection using CSA method is developed by Kumar and
kind of feedback network because, output of FIR filter depends on the Rawat in [11] for FIR-FOD problem. WLS fitness function is utilized in
present and past values of input [5]. Moreover, the hardware complexity this article to enhance the response of FOD. Another one FOD design is
and the switching activities between FCs should be reduced to minimize designed by Barsainya et al. [12] which is based on lattice wave digital
the power consumption of the FIR filter. The switching activities can be filter (LWDF). PSO, CSA and Genetic Algorithm (GA) are used to find the
reduced by considering the Hamming distance (HD) between successive optimal LWDF coefficients.
coefficients. An optimization-based method has been broadly utilized in Generally, performance of power and delay of the FIR filter design is
this esteem. Agarwal et al. [5] designed a High Pass (HP) FIR filter mostly influenced by the adder and multiplier circuits. The execution
design using real-coded genetic algorithm (RCGA) based on the L1 norm time of the FIR filter design is usually directed by the multipliers. Hence,
technique to improve the design accuracy. Kumar [6] had developed a a high-speed multipliers are required to design a delay with efficient FIR
Cuckoo Search Algorithm (CSA) for an optimal design of fractional delay filters [13]. Multiplication is generally realized using either AND-array
FIR filter. New fitness function is created in this paper to design FIR multipliers or Booth multipliers. The AND-array multipliers developed
* Corresponding author.
E-mail address: aditya.mandloi@medicaps.ac.in (A. Mandloi).
https://doi.org/10.1016/j.micpro.2021.104333
Received 17 December 2020; Received in revised form 9 June 2021; Accepted 20 August 2021
Available online 24 August 2021
0141-9331/© 2021 Elsevier B.V. All rights reserved.
A. Mandloi and S. Pawar Microprocessors and Microsystems 86 (2021) 104333
for M × Mmultiplication utilize AND gates to generate PP matrix with (MABC) to determine the FC with the amplitude response of desired and
Mrows. In booth multiplier, the input is recoded for the production of designed filters. Here, the FIR filter had been designed with distributed
signed and plural values of the multiplicand during PP generation. A arithmetic (DA) to determine the coefficient with predefined frequency
number of rows in PP accumulation matrix will get reduced [14]. In domain specifications. Coefficient multiplication adders (CMAs) and
general, the recoding process uses radix-4 Booth to recode the binary Structural adders (CAs) were implemented to shift and add the co
operand into radix-4 signed digits in the set {− 2, − 1, 0, 1, 2}. This efficients. The FIR filter had been designed with hardware complexity
recoding process is popularly used in the booth multipliers, because it with less computational time. Srivatsan and Venkatesan [28] had
needs simple shifts and complementation for the generation of PPs [15]. wished to implement an effective FIR designed filter with Farrow
In Booth multiplier, each stage is waited for the output of the prior stage, structure. The proposed method had been implemented with low
hence the PPs are added serially [16]. This could increase the delay of objective functions and it mainly depends on adder and subtracter to
the booth multiplier. Thus, there is a requirement of a parallel addition design the FIR filer with frequency response. Brain Storm Optimization
for the addition of PPs. (BSO) and Artificial Bee Colony (ABC) algorithm have been combined as
Nowadays, a number of delay efficient adders have been developed hybrid Brain Storm- Artificial Bee Colony (BSABC) algorithm to deter
to satisfy the necessities of real time applications [17]. They include mine the FC with the minimization of objective problems. Here, the FIR
RCA, carry select adders (CSLA), carry skip adders (CSKA) and carry filter had designed with windowing method and frequency sampling in
look ahead (CLA) adders. The structure of the RCA is so simple, requires order to minimize the magnitude frequency with the designed filter.
less area and it consumes less power. But in this adder the path delay is BSABC had been used to tune the optimal FCs.
very high. Thus, it is not suitable for high speed applications. A couple of Once the FC had been selected, various multipliers with adders had
sum words and carry bits are generated by the conventional CSLAs. been implemented to design the FIR filter with low power consumption.
Then, the sum and carry outputs are selected based on input carry using Patali and Kassim [29] increased the performance of the FIR filter by
multiplexers. The hardware complexity of such circuits are increased using retiming and modified CSLA adder structure. Here, the concept of
due to the existence of dual RCAs [18–21]. The CLA adder provides a CLA and CSKA adders were combined to form a new model for
direct parallel prefix structure for the rapid generation of carry. In CLA, concatenated CSLA. The module carry generation block was designed to
the propagate and generate signals are used to the earlier prediction of generate and transfer the end module carries rapidly. This modified
carry. The CSKA is one of the most efficient adder which consumes less adder was used to add the PPs of the booth multiplier. The FIR filter
energy and decreases delay. This CSKA structure is used to improve the proposed by Prasad et al. [30] had introduced a FIR filter with pipelined
path delay of RCA [22,23]. multiplier (FIR- PVM) and adder. The multiplication process had been
In CSKA, a select line of the multiplexer is formed by combining the performed hierarchically. The work had been established with Urdhva
propagate signals obtained from each blocks. These select signals are Triyagbhyam Vedic technique and, the path delay and step up violation
used to obtain the carry-out signal through rippling across the complete had been minimized with pad sensors. Here, the delay between the PP
circuit or by-passing the original carry-in towards the output. The generator and the adder was added to compensate for the combination
multiplexer logic used in the conventional CSKA contains more number delay of the Vedic multiplier.
of gate count [24,25]. Thus, the conventional CSKA which uses multi Xue et al. [31] had used a low power delay product-based booth
plexer logic consume more power and increase the path delay. By multiplier. In conventional booth multiplier, the PP and the adder were
combining both the incrementation and the concatenation schemes with defined to reduce the delay in addition with booth encoders. B2C com
the fundamental adder structure, the delay of CSKA get improved [26]. plement requires an inverter and this inverter had been used to eliminate
With the use of improved binary to two’s complement circuit and logic the delay with high power consumption. Radhakrishnan and Themozhi
optimization techniques such as a multiplexer free VL-CSKA in the [32] used a FIR with XOR MUX full adder and truncation multiplier to
parallel structure of booth multiplier to add the encoded PPs can achieve implement FIR filer. The truncation multiplier reduced the area of the FIR
improvement in delay and power compared to conventional booth structure on the basis of rounding approach. Also, the number of logic
multiplier. Thus, we have planned to modify the booth multiplier by gates were reduced using XOR-MUX based Full adders. In this work,
introducing a multiplexer free VL-CSKA in the parallel structure of booth Multiple Constant Multiplication process of the FIR filter was replaced
multiplier to add the encoded PPs. The foremost contributions of this with this XOR MUX truncation multiplier to increase the performance.
work are defined as follows: The main drawback of the proposed adder is, the critical was very high.
Kumar et al. [33] designed a FIR filter using a Modified Russian
Ø To find the optimal FCs by minimizing the pass band ripple (PBR), peasant multiplier (MRPM) with a square root CSLA (SQRT CSLA). This
stop band ripple (SBR), power consumption and Transition band adder had been implemented to shrink the hardware complexity of the
width using Enhanced Squirrel Search Algorithm (ESSA) multiplier and MAC unit. The working principle of this RPM multipli
Ø To reduce the delay of the FIR filer design by introducing parallel cation process was on the basis of ‘Multiply-Divide’ rule. Here, the left
addition of PPs in the booth multiplier design. and right shift operations were utilized for multiplying and dividing the
Ø To modify the booth multiplier using multiplexer free VL-CSKA for digital inputs. In the modified structure, this left shift operations were
less power consumption. used to generate the PPs. Then, the PPs were reordered using Reduced
Wallace tree generation (RWTG) process. Finally, the addition operation
Remaining of this research paper is ordered as follows: Section 1 was carried out using SQRT CSLA. The proposed adder computes a more
exposes an introduction for efficient FIR filter. Section 2 depicts a recent time to execute the addition operation.
works related to our proposed methodology. Section 3 elaborates the Sumalatha et al. [34] developed a FIR filter by modifying CSLA using
proposed optimized FIR filtering system. Section 4 illustrates a simula Binary to Excess-1 Converter (BEC). Here, the conventional CSLA used
tion outcomes and considerations of proposed methodology, and Sec BEC in the place RCA with carry input ‘one’ to improve the performance
tion 5 provides the conclusion as well as future enhancement of of the FIR filter in terms of area, power and delay. Paliwal et al. [35]
proposed FIR filter design. analyzed the performance of Fast FIR algorithm (FFA) using different
adders and multiplier design. The author had proposed FFA with Carry
2. Related works select adder (CSeLA) Vedic multiplier. They have selected add and shift
method, Vedic multiplier and booth multiplier for multiplier designs.
2.1. Certain recent related works are listed as follows Also, carry select adder, carry save adder and Han-Carlson adder were
selected for adder analysis. They were given the comparative analysis
Dwivedi et al. [27] had proposed a Modified artificial bee colony for these designs in terms of area, delay and power consumption.
2
From the above analysis, it is clear that the existing approaches filter. Normal structure of FIR filter with our proposed adders, delay and
considered either an optimal feature coefficient selection or hardware multipliers is shown in Fig. 1.
complexities such as adder and multiplier to design the FIR filter. But, Nowadays, the researchers face lot of design challenges while
there was a requirement to introduce a new innovative method by implementing FIR filter design. Several design challenges are minimi
considering both the optimal FC selection methods and hardware com zation of PBR, SBR and low power consumption. The minimization of
plexities to design a FIR filter. Also, it shows that delay had been both PBR and SBR are not possible in case of smaller order FIR filters. If
increased in conventional booth multipliers with the development of the filters are optimized based on frequency domain characteristics then,
radix-2 encoder and due to the complexity of the PP the delay had been it will consume more power. Thus, there is a requirement of an efficient
increased. By observing the above methods, we sense that this is a and fast convergence optimization algorithm.
relevant moment to forward a proposed method in an outlook view. In The FIR filter’s frequency response must retain low ripple in both
this paper, a novel innovative technique has been introduced to improve pass band and stop band. In this paper, the objective function is formed
the power and delay efficient FIR filter design. on the basis of deviation between the actual and desired frequency
response to minimize the ripples in FIR filter design. The FIR FCs should
3. Proposed FIR filter design be reduced to minimize the power consumption of FIR filter. Thus, we
proposed ESSA to obtain optimal FCs. This algorithm mainly aims to
FIR filter is derived from impulse response with the finite time period provide optimal FCs by minimizing the pass band and stop band ripples
and it is urbanized in signal processing domains such as, Image pro along with the reduction in power consumption. Delay in FIR filter
cessing, audio or video processing and medical processing. In our pro mostly depends on PP and it consumes more power. Normally, Booth
posed work, FIR filter is planned with carry skip adder and the major multiplier is introduced to pick up the delay by diminishing the PPs, but,
significant is to design the filter with high speed and low power. FIR it depends on the previous stage output as an input to the current stage.
filter may be designed in continuous and discrete time and the major So, the delay will get raised. In our proposed work, the delay can be
advantage is it can be able to filer the redundant noises present in signal. reduced by introducing a parallel addition of PP with booth multiplier
The general equation of FIR filter is given below: design. In Con CSKA, the stages are fixed so, the power will not get
reduced and it consumes very less speed and also uses multiplexers. By
∑ using multiplexers, the delay will not get reduced. So, in our work the
M− 1
V(M) = h(i)P(m − 1) (1)
i=0 power can be consumed by modifying the Con CSKA with VL-CSKA
because it requires variable sizes in each stage. Thus we are applying
Here, v (m) signifies the filter output, p represents the input data
variable sizes to each stage in order to reduce the delay. Then, FIR filters
given to the filter, and h denotes the FC and m symbolize the number of
are implemented in FPGA, using the optimized coefficients.
FC. Adders and multipliers are the main component to design the FIR
Fig. 1. Block diagram of FIR filter.

3
3.1. FIR filters design with optimum coefficient using ESSA The above equation represents the PBR, SBR and TW and it is used to
determine the FC with the optimization algorithm. After evaluating the
To design a FIR filter, there was a challenge to defeat the pass band FC, it should be sorted in an ascending order to optimize the best FC. To
and stop band ripple with low power consumption. So, there is a need to update the position of coefficient, the ESSA algorithm can be defined as
select an optimal FC to design a FIR filter without any distortions. In our three cases and it defined as three shifted position. By shifting from the
proposed work, FIR filter has been designed with ESSA to determine the current position to next position, new position is obtained with best FC
optimized FC to intensify the objective function based on the absolute and it is explained below:
frequency response of FIR filter. Case 1: FC exchanged from first position(Fc1st )to second posi
SSA [36] can be used to resolve unimodal, multimodal and tion(Fc2nd ), then the current location of coefficient is obtained by the
multi-dimensional optimization troubles. To achieve high global ability, below described method:
ESSA is implemented. Improved predator presence probability (Pdp) is
enhanced in ESSA to make steadiness with the capabilities of exploration { ( ) ( )
and exploitation. Initially, the FC can be selected randomly and it is ( ) Fc1st + dg .gc. Fc2nd
t − Fc1st
t R1 ≥ Pdp
Fc1st = ( )
defined as a dimensional matrix format: Cx Fc1st , E, He Otherwise
⎡
h11 Λ h1C
⎤ (10)
xuv (A) = ⎣ ⋮ ⋮ ⋮ ⎦ (2) Here,dg denotes the gliding distance, R1 represent the random
hB1 Λ hBC BXC number and it is in the range between 0 and 1, gc represent the gliding
C represents the number of FC to be optimized randomly. constant and t denotes the current iteration. The attainment of explo
In order to evaluate the best FC, the objective functions such as PBR, ration and exploitation is evolved with the help of gliding constant and
SBR, power consumption and Transition width (TW) can be framed in the value can be declared as 1.9. E represents entropy and it is imple
terms of FIR filter with desired and designed filter. The objective func mented as radius and He represents hyper entropy and He = 0.1 .
tion of PBR and SBR can be evaluated as: Case 2: If the FC exchanged from second position(Fc2nd )to third
( ) position(Fc3rd ) then, the present location is determined as:
Y1 = max |x(ω)| − λp + max(|x(ω)| − λs ) (3)
ω≤ωp ω≥ωp
{( ) ( )
λp &λs Represent desired term of PBR and SBR and x(ω)is a deviation ( ) Fc2nd + dg .gc. Fc3rd − Fc2nd R2 ≥ Pdp
and it is evaluated from the following equation. Fc2nd = ( 2nd t ) t (11)
Cx Fc ,E,He Otherwise
x(ω) = N(ω)[Hd (ω) − H(ω)] (4) R2 denotes the random number within the range of 0 and 1
Hd (ω)&H(ω) denotes the desired and designed filters of amplitude Case 3: The position of FC changed from third position (Fc3rd )to first
response and the FC Hc (ω) can be written as: position (Fc1st )then, the present location is decided as:
Hc (ω) = Hc (z) (5)
{( ) ( )
( ) ( ) Fc3rd + dg .gc. Fc1st − Fc3rd R3 ≥ Pdp
Here, ω = Nπ as well as the main aim of FIR filter is to reduce the Fc3rd
t = t
( 3rd t ) t
Cx Fct , E, He Otherwise
delay by obtaining the FC and the objective function of PBR and SBR
(12)
which can be determined in terms of FC and the Eq. (3) can be rewritten
as: In generally, the value of Pdp (predator presence probability) is
defined as 0.1 and the random value is in the range between 0 and 1. In
Mp
∑ [ (⃒ π m⃒ ) ] ∑Mp [ (⃒ π n⃒) ]
Y2 =
⃒ ⃒
abs abs ⃒Hp ⃒ − 1 − λp +
⃒ ⃒
abs abs ⃒Hs ⃒ − λs (6) order to enhance the exploitation of SSA, adaptive Pdp is implemented
m=0
N m=0
N with iteration number and it is given as:
( )
In order to minimize the power in FIR filter design the objective Pdp = Pdp max − Pdp min (1 − Iter/Itermax )10 + Pdp min (13)
function is established with respect to HD and it is defined as:
Pd max & Pdp min denotes the minimum and maximum predator pres
∑ ence probability.
M
( )
Y3 = π Coefj , Coefj+1 (7)
j=1 If the new position is created, it is to be clear that the new position
also leads to harm than the existing (old) position. So it is important to
Here, M represent the order of filter and π(Coefj , Coefj+1 ) denotes the check all the updated new position. If the selected new position is worst,
HD among the FC with binary values. If the FIR filter is designed with the then the old position takes place until it receives a better value. The best
same FCs, the HD and the power consumption will get minimized. The fitness can be calculated mathematically by the below equation.
HD is calculated from the below equation:
1 ∑m ⃒ ⃒ {
H(D1 , D2 ) = ⃒d j − d j ⃒ (8) Fccurrent , if f ccurrent < f cexisting
m j=1 1 2 Fci = i i i
(14)
Fcexisting
i , otherwise
D1 = (d11 , d21 , …dj1 , ), D2 = (d12 , d22 , …dj2 , )represent the binary string. Finally, the best FC is generated based on Eq. (14)
The HD is calculated using binary string in floating point illustration. ( )
The objective function of TW can be declared by the pass and ripple Fccurrent current
best,i = Cx Fcbest,i , E, He i = 1, 2, 3…n (15)
band and is evaluated as:
After evaluating the current best FC, the same procedure will be
[ (⃒ (πm)⃒ ) ] Kπ (
repeated with n number of iterations until to choose the global optimal
Mp
∑ ⃒ ⃒ )
Y4 = abs abs ⃒Hp ⃒ − 1 − λp + Ns − Np
m=0
N N FCs. After finding the optimized FC, the modified booth multiplier with
∑
Ns [ (⃒ (πn)⃒ ) ] parallel architecture is introduced to minimize the delay and power
⃒ ⃒
+ abs abs ⃒Hs ⃒ − 1 − λs (9) consumption using VL-CSKA. Pseudo code representation of ESSA
N
n=1 technique to determine the optimized FC is given in Algorithm 1.
K represents constant value and it is used to give a weight-age to TW.
4
Algorithm 1 is added with 1 s complement to generate a 2 s complement. In the

ESSA for optimal selection of FC. proposed booth multiplier, PP1 [9:0] is produced with 4 to 1 MUX. PP1
Begin is defined as (0) when Q [1:0] =00; if Q [1:0] = 01, then the PP1 is
Initialize all the parameters related to the ESSA algorithm and parameters of FIR filter declared as (P) else if Q [1:0] =10, the PP1 is defined as (− 2P) else if Q
Randomly select the FC and define it as a dimensional matrix format using (2). [1:0] =11, the PP1 is said to be (-P).4 to 1 MUX produces an output as
Estimate the objective function for each FC by calculating PBR, SBR and TW using
PP1 [9] with sign extension. PP1 is generated with 4 to 1 MUX and the
Eqs. (3), (6), (7) and (9).
Sorts the FC value in ascending order to obtain optimal FC. remaining is complicated with booth encoders to produce a second and
while iter<itermax third PP.
for case 1:
if R1 ≥ Pdp 3.2.1. Optimized booth encoder
Find the current location of coefficient by Eq. (10) Booth encoder is an essential module in booth multiplier. Booth
else
multiplier is implemented to diminish the number of PPs with the help of
Compute Cx (Fc1st , E, He ).
radix-4 booth encoder. Generally, only the positive bits (P, 2P) are
end for
for case 2: generated in PPs and the negative bits (-P,− 2P) are generated by sub
if R2 ≥ Pdp traction process. Adder and subtracter are needed to generate PPs and in
FC exchanged from second position to third position using Eq. (11). our method only adder is used to generate PP. In our proposed modified
else booth encoder, bubble pushing technique is implemented to generate
Calculate Cx (Fc2nd , E, He ) both positive and negative bits (P, 2P,-P,− 2P).
end for By using bubble pushing method it produces four scalars such as L,
for case 3:
M, N and O, which is shown in Table 1. Based on input bits Q, scalar (L) is
if R3 ≥ Pdp
developed to create (P) as a PP. In order to generate (2P) as a PP scalar
Position of FC changed from third to first position using Eq. (12).
else
(M) is urbanized in accordance with the input bits Q. Scalar (N) is
Compute Cx (Fc3rd defined as input bit to execute (− 2P) as a PP. Scalar (O) is established
t , E, He )
end for based on input bits and it generates (–P) as a PP. Based on scalar (L, M, N,
Evaluate the fitness using Eq. (14) O), the modified booth encoder is implemented in the below Fig. 3.
Sort the fitness according to ascending order. Booth encoder requires a scalar to produce a result with less delay and
Return optimal FC value.
area.
Stop.
L = ( ∼ ((Q0 ⊕ Q1 / Q2 ) (16)
3.2. Modified booth multiplier with VL-CSKA M = ( ∼ ((Q0 ⊕ Q1 / Q2 ) (17)
FIR filters are moderately used in many applications; hence it re N = ((Q0 / Q1 ) &Q2 ) (18)
quires a multiplication process and it also generates more PP. Exceeding
number of products increase the delay between the stages in FIR filter O = ((Q0 ⊕ Q2 )&Q2 ) (19)
designs. More number of adders were used in previous works to reduce
the power and delay. Adders such as, RCA, CSLA, CSKA and PPAs. RCA 3.2.2. VL-CSKA
provides a low critical path with more power consumption. CSLA con In CSKA, the process of addition is executed in multiple stages. The
sumes more power and increase the speed. In conventional booth data is separated into various blocks with variable bit size to finalize the
multiplier, the current phase needs to be waiting for the previous phase sum without delay. VL-CSKA is implemented to reduce the delay path by
output. So, the delay will get increased. Instead of series addition, par adding RCA block with incrementation block which is described in
allel addition is introduced in booth multiplier in order to minimize the Fig. 2. In conventional CSKA, carry skip logic with AOI and OAI com
delay and this could be optimized with the help of modified booth pound gates have been replaced as an alternative of multiplexers. By
multiplier with VL-CSKA. In our proposed work, radix 4 booth multiplier allocating various sizes of bits to an individual block, the performance
is developed based on VL-CSKA to reduce the delay. The proposed and the speed and delay of CSKA gets improved. The first and last RCA
structure of VL-CSKA for 8 digit input is shown in Fig. 2. block is defined as the bit size to one to diminish the delay. In our work,
The architecture contains 4 to1 MUX, three optimized booth encoder the input given to each RCA block is defined as zero to deduce the delay
and three VL-CSKAs. 10 bit VL-CSKA contain six stages, 9 bit VL-CSKA with better performance. Hence, RCA block execute simultaneously at
contain five stages and 11 bit VL-CSKA contain seven stages. The the same time because it does not requires a previous stage output as the
delay can be reduced by the parallel structure for the addition of input of present stage. The carry from each RCA block is propagated
encoded PPs. In our work, adder is implemented to merge the PPs. In the through skip logic.
proposed work, booth multiplier, 4 to 1 MUX is used to generate the first In each stage, RCA block is included with variable bit size. In VL-
PP (PP1) and the remaining second, third and fourth PPs are generated CSKA adder the size of the RCA block is represented as one and then
with booth encoder. The booth multiplier consists of four PPs such as the size will get increased from first to nucleus stage and after the nu
PP1, PP2, PP3 and PP4.All these PPs added with parallel operation. The cleus stage the size get reduced and ended with one. The stage with the
first two PP such as PP1 [9:0] and PP2 [8:0] are added with 10 bit VL- largest size is declared as nucleus stage. The size is declared in accor
CSKA. Secondly PP3 [8:0] and PP4 [8:0] are added with 9 bit VL-CSKA. dance with the delay performance of carry output and summation of
In stage one, two parallel addition takes place and in second stage the RCA block from the second stage to nucleus stage. The incrementation of
result of previous addition are added and finally produce a sum. size will get continued until the sizes became larger than N/2.In our
The worst path delay is highlighted in Fig. 2 and it is directed as: The proposed structure, except first stage all other stages are included with
horrible path is directed from B2C (Binary to 2 s complement) to booth two blocks. The carry output from the previous stage and the interme
encoder and from booth encoder, to 10 bit VL-CSKA which is placed in diate result from RCA blocks are given to the increment block to esti
first stage and ended with 11 bit VL-CSKA adder with the second stage of mate the concluding sum of each stage. The carry out is propagated to
parallel addition. B2C is one of the important component in booth skip logic and in one stage AND-OR-Invert (AOI) is defined as skip logic
multiplier in order to invert the bit and generate (-X). In B2C conversion, and in next stage OR-AND-Invert (OAI) is used as skip logic. Here, the
first the input bit will be inverted as 1 s complement and then a single bit carry is complemented when it is propagated through the skip logic.
5
Fig. 2. Design of modified booth multiplier with VL-CSKA adder.
Hence, the even stage skip logics produce the complement of the carry. and variable frequency sinusoidal signals has been generated and
The advantage of using this model is, to find the carry output of next applied to the vertex 7 FPGA in the Xilinx ISE 14.5 tool for validating the
stage and it does not requires carry from incrementation block, so the proposed FIR filter’s low pass characteristics. To validate the perfor
delay gets reduced and also it consumes less power to give a better mance of the proposed FIR filter design with optimal coefficient selec
performance. tion algorithm using ESSA and VL-CSKA based booth multiplier, an
analysis has been conducted through two phases. The first phase
4. Result and discussion analyzed the effectiveness of the proposed FIR filter by comparing it
with different metaheuristic approaches for proving its effectiveness of
The proposed FIR filter design has been simulated in the Xilinx ISE optimal FC selection process using ESSA. The second phase analyzed the
14.5 tool using Verilog coding. The proposed FIR filter has been hardware complexity of the proposed FIR filter by considering different
designed by selecting an optimal coefficient with ESSA. After selecting multiplier and adder structures.
the optimal FC, the FIR filter has been designed using VL-CSKA based The proposed 24-tap FIR filter has been designed by considering the
multiplexer free booth multiplier. Here, a collection of fixed amplitude subsequent specification: normalized pass band and normalized stop
band frequency which are assumed as 0.45 rad/s and 0.55 rad/s. Also,
normalized PBR and normalized SBR are taken as 0.01, and 0.01.
Table 1 Initially, the FCs have been initialized randomly by ESSA. After that,
Radix 4- 8 bit booth encoding scheme. they are updated by considering the minimization objectives presented
Q2i+1 Q2i Q2i− 1 PP in Section 3.1. The optimal FCs are obtained for 24-tap FIR filter using
ESSA and other optimization approaches like PSO, ABC, MABC [19] and
0 0 0 0
0 0 1 +P
BSABC [20] have been listed in Table 2 to show the performance of
0 1 0 +P proposed filter design. Here, PSO and ABC algorithms are implemented
0 1 1 +2P to obtain the coefficient value.
1 0 0 − 2P After determining the optimal FCs, they were used to simulate the
1 0 1 -P
frequency magnitude response of FIR filter in decibel (dB). Different
1 1 0 -P
1 1 1 0 metaheuristic approach provides different frequency magnitude
6
Fig. 3. Modified booth encoder.
Table 2
Optimal FCs of 24-tap FIR filter using Different met heuristic approaches.
h(n) PSO ABC MABC [27] BSABC [28] Optimized values selected by ESSA
h(0) = h(23) 0.0197 − 0.0192 0.0288 0.0390 − 0.0023

h(1) = h(22) − 0.0242 − 0.0044 0.0474 − 0.0204 − 0.0276
h(2) = h(21) − 0.0482 0.0237 0.0058 − 0.0390 − 0.0446
h(3) = h(20) 0.0641 0.0243 − 0.035 0.0798 0.0413
h(4) = h(19) 0.0686 − 0.0239 0.0013 0.0874 0.0541
h(5) = h(18) − 0.0401 − 0.0419 0.0614 − 0.0427 0.0312
h(6) = h(17) 0.0014 0.0348 0.0039 0.0182 0.0155
h(7) = h(16) − 0.0487 0.1117 − 0.1000 − 0.0540 − 0.0418
h(8) = h(15) − 0.1492 − 0.0293 0.0029 − 0.1410 − 0.1472
h(9) = h(14) 0.0335 − 0.3102 0.3186 0.0438 0.0413
h(10) = h(13) 0.2386 − 0.4577 0.5048 0.2294 0.2172
h(11) = h(12) − 0.0228 0.3109 − 0.0179 − 0.0241 − 0.0324
h(12). 0.6853 0.5000 0.9821 0.6821 0.8712
Fig. 4. Magnitude response of 24-tap FIR filter using different meta

heuristic approaches. Fig. 5. Convergence analysis.
7
response and they have been illustrated in Fig. 4. This Figure shows that The effectiveness of the proposed VL-CSKA based booth multiplier
the response of the proposed ESSA approach contains minor changes has been verified through the comparison with some existing multiplier
with stop band attenuation of nearly 60 dB for the 24-Tap FIR filter. The including conventional booth multiplier, modified booth multiplier [37]
convergence behavior of proposed as well as existing methods are dis and Low delay-product booth multiplier [31] design in Table 3. The area
played in Fig. 5. When compared to existing methods, the proposed and delay of the conventional booth multiplier has been enlarged due to
ESSA approach provides optimal FCs with less ripples by considering a the use of radix 2 encoder and this multiplier is designed by ourselves.
smaller number of iterations. This is due to the exploration and exploi Instead, modified booth multiplier used Radix 4 to improve its perfor
tation behavior of this ESSA technique. Because, position updating mance. The booth multiplier that used Radix 4 encoder will give good
equations are combined with adaptive step size strategy and this helps to outcome when we considered both delay and area performance simul
find an optimal FCs in a smaller number of iterations. So that, the pro taneously. Also, low delay-product booth multiplier gives better per
posed algorithm speedily converged with small number of iterations. formance than modified booth multiplier due to the inclusion of B2C
After selecting the optimal FC using ESSA, the FIR filter has been converter and 2:1 multiplexer design for the avoidance of 15-bit add
designed using VL-CSKA based booth multiplier. The RTL view of the er/subtractor required by the modified booth multiplier. Moreover, this
proposed FIR filter for the order of 24 and VL-CSKA based booth low delay-product booth multiplier design is taken from [31] to show
multiplier are shown in Fig. 6. In the proposed FIR filter design, the the effectiveness of proposed multiplier. However, every stages of both
optimal coefficients have been selected using ESSA module. After that, the methods are reliant on the outcomes of the preceding stage. Hence
VL-CSKA based booth multiplier and VL-CSKA has been used to develop increased the delay of the multiplication process. Hence, the proposed
FIR filter. This multiplier required booth encoder modules and VL- multiplier outperforms the existing multiplier circuits due to its intro
CSKAs. This structure added the PPs using two parallel stage addition. duction of parallel addition of PPs in the booth multiplier design. The
Fig. 6. RTL schematic of the proposed work (a) Proposed VL-CSKA based booth multiplier (b) Proposed FIR filter.
8
Table 3
Performance analysis of the proposed VL-CSKA based booth multiplier.
Multipliers Conventional Booth multiplier Modified booth multiplier [37] Low delay-product booth multiplier [31] Proposed
No. of slices 183 122 68 41

LUT 372 287 93 67
Delay (ns) 16.21 14.92 7.312 4.339
Power (W) 0.249 0.135 0.122 0.083
Table 4
. Comparison of FIR filters with different topologies of multiplier.
Resources FFA with HCA-VEDIC multiplier FFA with CSelA- VEDIC multiplier FIR- PVM MRPM-FIR FIR with XOR-MUX multiplier Proposed
[35] [35] [30] [33] [32]
Number of slice 1455 3592 990 1523 873 561

LUTs
Delay (ns) 32.346 31.180 30.46 96.081 30.48 28.175
Power (mW) 0.920 0.630 1.923 0.796 0.423 0.143
VL-CSKA adder design further increases the speediness of the addition Declaration of Competing Interest
process of parallel products by skipping the carry propagation with the
help of compound gates (skip logic). Also, the proposed design consumes Authors Aditya Mandloi, Dr. Santosh Pawar declares that they has no
less power due to the avoidance of multiplexer requirement in the booth conflict of interest. Patients’ rights and animal protection statements:
encoding structure and skip logic of VL-CSKA added. This research article does not contain any studies with human or animal
Furthermore, the proposed FIR design is compared with the different subjects.
topologies of FIR filter with multiplier and adder in Table 4. The various
existing FIR filter design are: FFA with HCA-VEDIC multiplier, FFA with Reference
CSelA-VEDIC multiplier [35], FIR filter design is much better than that
of the existing designs in terms of area utilization, delay and power [1] B.N.M. Kumar and H.G. Rangaraju, 2019 Low area VLSI implementation of CSLA
for FIR filter design.
consumption. These existing FIR filter designs did not select the co
[2] N. Thankachan, S. Cyraic, Efficient design of FIR filter using modified booth
efficients optimally. Thus, they used dissimilar FIR FCs and increased multiplier, Int. J. Sci. Res. Eng. Technol. 4 (10) (2019) 2278–2882.
the switching activity of FIR filter design. Thus, they consume more [3] M. Sumalatha, P.V. Naganjaneyulu, K.S. Prasad, Low power and low area VLSI
power. implementation of vedic design FIR filter for ECG signal de-noising, Microprocess.
Microsyst. 71 (2019), 102883.
Also, the proposed design of FIR filter reduces the delay because of [4] S. Nagaria, A. Singh, V. Niranjan, Efficient FIR filter design using booth multiplier
the development of high-speed addition and multiplication. The area for VLSI applications, in: Proceedings of the International Conference on
requirement of FIR filter that used VEDIC multiplier have been increased Computing, Power and Communication Technologies (GUCON), IEEE, 2018,
pp. 581–584.
because they divided the input into 2n and again it is clustered into 4n sized [5] A. Aggarwal, T.K. Rawat, M. Kumar, D.K. Upadhyay, Optimal design of FIR high
lumps. This fixed stage split will require more components and also pass filter based on L1 error approximation using real coded genetic algorithm,
reduces the speed. Alternatively, the MRPM-FIR [33] increase the circuit Eng. Sci. Technol. Int. J. 18 (4) (2015) 594–602. Dec 1.
[6] M. Kumar, Optimal design of fractional delay FIR filter using cuckoo search
complication in the PP generation part. Also, the performance of algorithm, Int. J. Circuit Theory Appl. 46 (12) (2018) 2364–2379. Dec.
XOR-MUX [32] multiplier-based FIR filter has been reduced because the [7] A. Aggarwal, M. Kumar, T.K. Rawat, Design of two-dimensional FIR filters with
XOR and multiplier designs usually consume more power than that of quadrantally symmetric properties using the 2D L1-method, IET Signal Process. 13
(3) (2019) 262–272. May).
ordinary logic gates. The proposed FIR filter affords much better per [8] A. Aggarwal, M. Kumar, T.K. Rawat, D.K. Upadhyay, Optimal design of 2D FIR
formance in terms of area utilization, delay and power than that of the filters with quadrantally symmetric properties using fractional derivative
existing FIR filters due to the inclusion of optimal FC selection module constraints, Circuits Syst. Signal Process. 35 (6) (2016) 2213–2257. Jun 1.
[9] M. Kumar, T.K. Rawat, Design of fractional order differentiator using type-III and
and VL-CSKA based booth multiplier. type-IV discrete cosine transform, Eng. Sci. Technol. Int. J. 20 (1) (2017) 51–58.
Feb 1.
5. Conclusion [10] M. Kumar, T.K. Rawat, Fractional order digital differentiator design based on
power function and least squares, Int. J. Electron. 103 (10) (2016) 1639–1653. Oct
2.
In this paper, we have designed a low power 24-tap low pass FIR [11] M. Kumar, T.K. Rawat, Optimal design of FIR fractional order differentiator using
filter with, high speed and efficient area. The best FC has been evaluated cuckoo search algorithm, Expert Syst. Appl. 42 (7) (2015) 3433–3449. May 1.
[12] R. Barsainya, T.K. Rawat, M. Kumar, Design of minimum multiplier fractional
with ESSA optimization algorithms by minimizing the ripple contents,
order differentiator based on lattice wave digital filter, ISA Trans. 66 (2017)
power and transition width. The ESSA algorithm has choose the best 404–413. Jan 1.
coefficient value with best convergence rate and it has been proved [13] G. Haridas, D.S. George, Area efficient low power modified booth multiplier for FIR
through simulation results. The delay in FIR filter has been decreased filter, Procedia Technol. 24 (2016) 1163–1169.
[14] S. Venkatachalam, E. Adams, H.J. Lee, S.B. Ko, Design and analysis of area and
with proposed booth multiplier. The PP generation of booth multiplier power efficient approximate booth multipliers, IEEE Trans. Comput. 68 (11)
has been achieved by parallel addition and also the delay has been (2019) 1697–1703.
improved. The booth multiplier has been modified with multiplexer free [15] E. Antelo, P. Montuschi, A. Nannarelli, Improved 64-bit radix-16 booth multiplier
based on partial product array height reduction, IEEE Trans. Circuits Syst. I Regul.
VL-CSKA to consume low power with less delay. The performance Pap. 64 (2) (2016) 409–418.
analysis showed that the proposed 24 tap low pass FIR consume 0.142 [16] N.V.V.K. Boppana, J. Kommareddy, S. Ren, Low-cost and high-performance 8× 8
mW less power with 28.175 ns delay. Also, the comparison result vali booth multiplier, Circuits Syst. Signal Process. 38 (9) (2019) 4357–4368.
[17] B. Sanjana, K. Ragini, Design of a novel high-speed-and energy-efficient 32-bit
dates that our proposed FIR filter has provided a better outcome in terms carry-skip adder. Innovations in Electronics and Communication Engineering,
of power, area and speed. Springer, Singapore, 2019, pp. 335–343.
[18] M. Pandey, Tech scholar Sarvesh Mani, and Anil Khandelwal. Implementation on
low power and less area multiplier using adder (2019).
[19] Y. Ramalakshmanna, V.Y. Varma, P.S. Kumar, T. Prasad, Modified Vedic multiplier
using CSLA adders, J. Comput. Theor. Nanosci. 16 (4) (2019) 1255–1269.
9
[20] P. Hemalatha and A.L.I.S. Ekbal, Implementation of area and delay radix-16 booth [35] P. Paliwal, J.B. Sharma, V. Nath, Comparative study of FFA architectures using
multiplier for FIR circuits. (2019). different multiplier and adder topologies, Microsyst. Technol. (2019) 1–8.
[21] P. Pramod, T.K. Shahana, Efficient modular hybrid adders and radix-4 booth [36] M. Jain, V. Singh, A. Rani, A novel nature-inspired algorithm for optimization:
multipliers for DSP applications, Microelectron. J. (2020), 104701. squirrel search algorithm, Swarm Evol. Comput. 44 (2019) 148–175.
[22] K. Priyanka and A.M. Gunasekhar, Design and verilog HDL implementation of [37] A. Mandloi, S. Pawar, VLSI design of APT-VDF using novel variable block sized
carry skip adder using kogge-stone tree logic (2018). ternary adder and multiplier, Microprocess. Microsyst. 78 (2020), 103266. Oct 1.
[23] B. Indhumathi, K. Ishwarya, K. Jamuna, G.N. Balaji, T. Prabhu, Low power and
high speed carry select adder using skip logic, Int. J. Innov. Res. Sci. Technol. 5 (6)
(2018) 58–62.
[24] S. Jom, J. Asha, Hybrid variable latency carry skip adder, in: Proceeding of the Aditya Mandloi completed the B.E. in Electronics & Tele
International Conference on Circuits and Systems in Digital Enterprise Technology communication from the Jabalpur Engineering College,
Jabalpur (M.P.) in 2003, and the M.Tech. In Microelectronics
(ICCSDET), IEEE, 2018, pp. 1–6.
[25] M. Indu, D. Somasundareswari, Energy efficient carry skip adder using skip logic in and VLSI Design from the S.G.S.I.T.S. Indore (M.P.) in 2005. He
is currently a Ph.D. candidate in the Department of Electronic
various voltage levels, Int. J. Manag. Inf. Technol. Eng. BEST IJMITE 3 (10) (2015)
145–150. & Communication Engineering, Dr. A. P. J. Abdul Kalam Uni
[26] M. Bahadori, M. Kamal, A. Afzali-Kusha, High-speed and energyefficient carry skip versity, Indore (M.P.), India. His research interests include
adder operating under a wide range of supply voltage levels, IEEE Trans. Very VLSI design, signal/image Processing.
Large Scale Integr. Syst. 24 (2) (2016).
[27] A.K. Dwivedi, S. Ghosh, N.D. Londhe, Modified artificial bee colony optimisation
based FIR filter design with experimental validation using field-programmable gate
array, IET Signal Process. 10 (8) (2016) 955–964.
[28] K. Srivatsan, N. Venkatesan, Farrow structure based FIR filter design using hybrid
optimization, AEU Int. J. Electron. Commun. 114 (2020), 153020.
[29] P. Patali, S.T. Kassim, High throughput FIR filter architectures using retiming and
modified CSLA based adders, IET Circuits Devices Syst. 13 (7) (2019) 1007–1017.
Dr. Santosh Pawar has received Doctor of Philosophy in the
[30] J. Prasad, D.M. Geetha, K. Srinivasan, Experimental setup of stretchable arid dry area of Non-Linear Fiber Optics from Devi Ahilya Vishwa
pad sensors for the signal acquisition fir filter design using Vedic approach,
vidhyalaya, Indore in 2014 and Master of Technology in Op
Measurement 141 (2019) 209–216.
tical Communication from Shri G. S. Institute of Technology
[31] H. Xue, R. Patel, N.V.V.K. Boppana, S. Ren, Low-power-delay-product radix-4 8* 8
and Science, Indore, in 2007. His research area spans from
Booth multiplier in CMOS, Electron. Lett. 54 (6) (2018) 344–346.
Microelectronics and VLSI Design, wireless communication,
[32] P. Radhakrishnan, G. Themozhi, FPGA implementation of XOR-MUX full adder optical fiber communication, integrated optics, non-linear op
based DWT for signal processing applications, Microprocess. Microsyst. 73 (2020),
tics, optical fiber Bragg grating based devices and optical
102961.
wireless communication. He published more than 25 research
[33] C.U. Kumar, S. Kamalraj, Ambient intelligence architecture of MRPM context based
papers in leading International Journals indexed in SCI, WOS
12-tap further desensitized half band FIR filter for EEG signal, J. Ambient Intell.
and SCOPUS and presented around 30 research papers in In
Humaniz. Comput. 11 (4) (2020) 1459–1466. ternational/National conferences. Under his supervision he
[34] M. Sumalatha, P.V. Naganjaneyulu, K.S. Prasad, Low-power and area-efficient FIR
guided 15 master’s thesis and 1 Ph.D. thesis. He his life member
filter implementation using CSLA with BEC. Microelectronics, Electromagnetics of Indian Science Congress Association (ISCA).
and Telecommunications, Springer, Singapore, 2018, pp. 137–142.
10

1 s2.0 S0141933121004920 Main

Uploaded by

Copyright:

Available Formats

1 s2.0 S0141933121004920 Main

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0141933121004920 Main

Uploaded by

Copyright:

Available Formats

Microprocessors and Microsystems 86 (2021) 104333

Contents lists available at ScienceDirect

Microprocessors and Microsystems

Fig. 1. Block diagram of FIR filter.

Algorithm 1 is added with 1 s complement to generate a 2 s complement. In the

3.2. Modified booth multiplier with VL-CSKA M = ( ∼ ((Q0 ⊕ Q1 / Q2 ) (17)

Fig. 2. Design of modified booth multiplier with VL-CSKA adder.

Fig. 3. Modified booth encoder.

h(0) = h(23) 0.0197 − 0.0192 0.0288 0.0390 − 0.0023

Fig. 4. Magnitude response of 24-tap FIR filter using different meta

No. of slices 183 122 68 41

Number of slice 1455 3592 990 1523 873 561

You might also like

1 s2.0 S0141933121004920 Main

Uploaded by

Copyright:

Available Formats

1 s2.0 S0141933121004920 Main

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0141933121004920 Main

Uploaded by

Copyright:

Available Formats

Microprocessors and Microsystems 86 (2021) 104333

Contents lists available at ScienceDirect

Microprocessors and Microsystems

Fig. 1. Block diagram of FIR filter.

Algorithm 1 is added with 1 s complement to generate a 2 s complement. In the

3.2. Modified booth multiplier with VL-CSKA M = ( ∼ ((Q0 ⊕ Q1 / Q2 ) (17)

Fig. 2. Design of modified booth multiplier with VL-CSKA adder.

Fig. 3. Modified booth encoder.

h(0) = h(23) 0.0197 − 0.0192 0.0288 0.0390 − 0.0023

Fig. 4. Magnitude response of 24-tap FIR filter using different meta­

No. of slices 183 122 68 41

Number of slice 1455 3592 990 1523 873 561

You might also like

Fig. 4. Magnitude response of 24-tap FIR filter using different meta