Project Review: by Vamsikrishna Chemudupati 14BEC0022

PROJECT REVIEW
By
VAMSIKRISHNA CHEMUDUPATI
14BEC0022
TITLE OF THE PROJECT
NOVEL STRUCTURE FOR AREA-EFFICIENT

IMPLEMENTATION OF FIR FILTER
ABSTRACT
• It is observed that in multiplier less implementation of
transposed direct form (TDF) finite impulse response
(FIR) filters, the adders in the product accumulation
block, named as structural adders (SAs), contribute the
major part of the overall logic complexity.
• A novel FIR filter structure is therefore proposed to
reduce the hardware complexity of the product
accumulation block.
• In the proposed structure, half of the long word-length
SAs are replaced by adders, named as pre-structural
adders (PSAs), which have relatively shorter word-
length.
ABSTRACT
• The filter coefficients are carefully grouped to take
advantage of the symmetric impulse response of linear
phase FIR filters.
• The overall area-delay performance and power-delay

performance of the proposed implementation is
superior to existing techniques
• It is shown that area-time efficient design of MCM

blocks can be obtained by using the proposed
techniques.
Literature review
SOFTWARE USED
• Coefficient generation : Matlab
• Simulation Tool: ModelSim Verilog HDL
• Synthesis Tool : Xilinx

Introduction
• A FIR filter is a
fundamental building
block in digital signal
processing systems.
• Generally a FIR filter
implemented either in
Direct form or Direct
transposed form.
• Generally a Direct
transposed form is
preffered for large scale
applications
Introduction
• A TDF filter consists of two parts :
1. Multiplication constant multiplication
2. Product accumulation block
The products generated by the MCM block are delayed
and accumulated in the product accumulation block to
produce the filter output . To reduce complexity of FIR
filters , a lot of effort has been put into efficient
implementation of MCM blocks and design techniques
have been proposed. The adders are often ignored by
the researchers.
A round off can be performed on the accumulation results but
for this the precision of the result must be sacrified.
Existing system
• The existing multiplier-based structures use either
direct form configuration or transpose form
configuration.
• But the multiplier-less structures use transpose form
configuration, whereas the Distributed Arthimetic
based structure uses direct-form configuration. But, we
do not find any specific block-based design for FIR
filter in the literature.
• However the block structure obtained is not efficient
for large filter lengths and variable filter coefficients,
such as SDR channelizer.
Drawbacks
• Less Efficiency
• More Complexity for Placing Memory Cells in
applications having large filter lengths.
• A DA is fully serial, operating on one bit at a time. If the
input data sequence is W bits wide, then a FIR
structure takes W clock cycles to compute the output.
Hence this operation increases the delay.
• Being serial the area increases.
Proposed system
• The focus of this paper is the proposal of a novel filter
structure for efficient implementation of a given filter with
fixed filter coefficients.
• The design of these given filter coefficients are out the
scope of this paper.
• For linear phase FIR filter, the impulse response is
symmetric, i.e., |hk| = |hN-k|. Therefore, the distinct
coefficients that need to be implemented are
{hi|0 ≤ i ≤ ⌈N=2⌉}.
• In the proposed structure, the multiplications of fixed
coefficients with the input variable are performed by two
separate MCM blocks. Since there is no sharing of partial
products across the two MCM blocks, overhead is
introduced by splitting an MCM block into two.
advantages:
• High Efficiency
• Area Efficient
• Without undermining the accuracy , performance is
Increased
• FIR Filters with Wallace Tree & MCM Blocks to
reduce the delay
Proposed system block diagram
Future enhancement
• We will implement FIR Filters using Wallace adder
technique
• The adders will be varied such as:
1. Ripple carry adder
2. Carry look ahead adder
• With different adders we can expect different delays,
area measurement and power consumed .
• Hence the comparison can be studied.
Verilog code
The key elements needed are:
1. Read only memory for storage of coefficients
2. Multiple constant multiplication block.
3. A D-register for storage
4. Multiplication of coefficients and input using shift and
add technique.
5. Adding all the partial products using wallace adder
tree.
6. Checking the input and output using the basic FIR
filter equation
Distributed arthimetic system
Why MCM is better than Direct form ?
MCM: Direct form:
• As the name suggests the • In direct form we need to
multiple coefficients give different coefficients
initialized remain constant for different inputs.
throughout. • Hence this consumes time
• Even though the input and makes it relatively
changes the coefficients harder.
remain the same. • The operation is not
• A MCM does not follow pipelined and it takes one
distributed arithmetic coefficient at a time.
method and operation is • The delay is more as well as
pipelined the area consumed.
• The delay is less.
Read only memory
• A ROM is needed to store the coefficients which remain
same throughout the operation.
• In the test operation I consider the coefficients as numbers
starting from 1 to 16 .
• Hence 16 is the number of taps.
• The ROM is mainly a D register which follows the operation
of D flip flop.
• If reset = 1:
The D register gives zero as output for all coefficients.
• If reset = 0:
The D register gives the value assigned as the input.
ROM output
The first series of output is due to Reset value being 1
The second series of output is due to Reset value being 0
MCM Block
• In this operation we are taking 4 MCM blocks for 16
coefficients .Each MCM block has 4 coefficients.
• The input signal is induced into the block to generate 4
partial products for the equal number of coefficients.
• If the number of coefficients is odd then we take odd
number in each MCM block
eg: For 9 coefficients we can take 3 coefficients per
MCM block
• Using verilog code we generate 4 outputs per clock
cycle from each Block.
MCM Block structure
Z inverse blocks are the delay producing blocks

H0 - h15 are the coefficients
4 MCM blocks
MCM 1 output
MCM 2 output
MCM output 3
MCM output 4
MCM OUTPUTS
• Each MCM gives 4 outputs.
• It follows add and shift algorithm hence no multiplier is
involved in it.
• For the ease of checking the coefficient values are
taken numbers from 1 to 16
• The input is taken as 10 in binary.
• Once the fair code is available we can take coefficient
values from Matlab.
D Register
• The D registers store the intermediate results during
the FIR filter operation.
• The D registers are mainly D flip flops.
• Their working can be described as:
If reset = 1:
The D register gives zero as output for all coefficients.
If reset = 0:
The D register gives the value assigned as the input.
• The D flip flop considers the positive edge to produce
output.
D Register
• There are four lengths used for the D registers:
1. A 8 bit register.
2. A 17 bit register
• A 20 bit register to store the output y[n].

• The register size is set based on the number of
multiplication and addition process.
• If a multiplication process occurs then the answer is 2n
bits.
• If a addition process occurs then the answer is n+1 bits.
D register output
Shift and Add technique
• This technique is used to multiply just by shifting bits in
the multiplicand depending on the multiplier bits.
• The position of the 1st MSB bit is 0 and it increments
by 1.
• If in the multiplier any bit is 1 then by seeing its
position we can left shift the multiplicand by the same
number of bits.
• Add all the shifted numbers.
• In this way only multiplicand is added by shifting
the multiplier.
Example of shift and add algorithm
Multiplication result of previous slide
Adder tree
• A adder tree is used to add the partial products
from each MCM block.
• Wallace adder tree is the whole framework
involving multiplication and addition of the
coefficients and inputs.
• The wallace adder tree can be implemented using
different adders and then we can check various
parameters such as delay, area etc.
• The output has been shown for the 1st MCM
block and same is repeated for others.
Output of the 1st Adder tree
Future work to be done
• All sub modules are ready hence final code
can be generated by connecting them.
• In the paper the partial products are added
directly and the adder is not mentioned.
• For future modification I am going to try it
with different adders and compare the results.
• The parameters can be found using the xilinx
ISE software.

Project Review: by Vamsikrishna Chemudupati 14BEC0022

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Project Review: by Vamsikrishna Chemudupati 14BEC0022

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Review: by Vamsikrishna Chemudupati 14BEC0022

Uploaded by

Copyright:

Available Formats

PROJECT REVIEW

NOVEL STRUCTURE FOR AREA-EFFICIENT

• The overall area-delay performance and power-delay

• It is shown that area-time efficient design of MCM

• Coefficient generation : Matlab

• Simulation Tool: ModelSim Verilog HDL

• Synthesis Tool : Xilinx

Z inverse blocks are the delay producing blocks

• A 20 bit register to store the output y[n].

You might also like