- The document proposes a novel FIR filter structure to reduce hardware complexity in the product accumulation block. It groups filter coefficients carefully to take advantage of symmetric impulse responses in linear phase FIR filters.
- Half of the long word-length structural adders are replaced by shorter word-length pre-structural adders. This improves the overall area-delay and power-delay performance compared to existing techniques.
- Verilog code is used to implement the multiplier-less FIR filter structure using shift-and-add technique for multiplication, D registers for storage, and Wallace tree adder for accumulation. Future work involves comparing performance with different adder implementations.
- The document proposes a novel FIR filter structure to reduce hardware complexity in the product accumulation block. It groups filter coefficients carefully to take advantage of symmetric impulse responses in linear phase FIR filters.
- Half of the long word-length structural adders are replaced by shorter word-length pre-structural adders. This improves the overall area-delay and power-delay performance compared to existing techniques.
- Verilog code is used to implement the multiplier-less FIR filter structure using shift-and-add technique for multiplication, D registers for storage, and Wallace tree adder for accumulation. Future work involves comparing performance with different adder implementations.
- The document proposes a novel FIR filter structure to reduce hardware complexity in the product accumulation block. It groups filter coefficients carefully to take advantage of symmetric impulse responses in linear phase FIR filters.
- Half of the long word-length structural adders are replaced by shorter word-length pre-structural adders. This improves the overall area-delay and power-delay performance compared to existing techniques.
- Verilog code is used to implement the multiplier-less FIR filter structure using shift-and-add technique for multiplication, D registers for storage, and Wallace tree adder for accumulation. Future work involves comparing performance with different adder implementations.
- The document proposes a novel FIR filter structure to reduce hardware complexity in the product accumulation block. It groups filter coefficients carefully to take advantage of symmetric impulse responses in linear phase FIR filters.
- Half of the long word-length structural adders are replaced by shorter word-length pre-structural adders. This improves the overall area-delay and power-delay performance compared to existing techniques.
- Verilog code is used to implement the multiplier-less FIR filter structure using shift-and-add technique for multiplication, D registers for storage, and Wallace tree adder for accumulation. Future work involves comparing performance with different adder implementations.
Download as PPTX, PDF, TXT or read online from Scribd
Download as pptx, pdf, or txt
You are on page 1/ 35
PROJECT REVIEW
By VAMSIKRISHNA CHEMUDUPATI 14BEC0022 TITLE OF THE PROJECT
NOVEL STRUCTURE FOR AREA-EFFICIENT
IMPLEMENTATION OF FIR FILTER ABSTRACT • It is observed that in multiplier less implementation of transposed direct form (TDF) finite impulse response (FIR) filters, the adders in the product accumulation block, named as structural adders (SAs), contribute the major part of the overall logic complexity. • A novel FIR filter structure is therefore proposed to reduce the hardware complexity of the product accumulation block. • In the proposed structure, half of the long word-length SAs are replaced by adders, named as pre-structural adders (PSAs), which have relatively shorter word- length. ABSTRACT • The filter coefficients are carefully grouped to take advantage of the symmetric impulse response of linear phase FIR filters.
• The overall area-delay performance and power-delay
performance of the proposed implementation is superior to existing techniques
• It is shown that area-time efficient design of MCM
blocks can be obtained by using the proposed techniques. Literature review SOFTWARE USED
• Coefficient generation : Matlab
• Simulation Tool: ModelSim Verilog HDL
• Synthesis Tool : Xilinx
Introduction • A FIR filter is a fundamental building block in digital signal processing systems. • Generally a FIR filter implemented either in Direct form or Direct transposed form. • Generally a Direct transposed form is preffered for large scale applications Introduction • A TDF filter consists of two parts : 1. Multiplication constant multiplication 2. Product accumulation block The products generated by the MCM block are delayed and accumulated in the product accumulation block to produce the filter output . To reduce complexity of FIR filters , a lot of effort has been put into efficient implementation of MCM blocks and design techniques have been proposed. The adders are often ignored by the researchers. A round off can be performed on the accumulation results but for this the precision of the result must be sacrified. Existing system • The existing multiplier-based structures use either direct form configuration or transpose form configuration. • But the multiplier-less structures use transpose form configuration, whereas the Distributed Arthimetic based structure uses direct-form configuration. But, we do not find any specific block-based design for FIR filter in the literature. • However the block structure obtained is not efficient for large filter lengths and variable filter coefficients, such as SDR channelizer. Drawbacks • Less Efficiency • More Complexity for Placing Memory Cells in applications having large filter lengths. • A DA is fully serial, operating on one bit at a time. If the input data sequence is W bits wide, then a FIR structure takes W clock cycles to compute the output. Hence this operation increases the delay. • Being serial the area increases. Proposed system • The focus of this paper is the proposal of a novel filter structure for efficient implementation of a given filter with fixed filter coefficients. • The design of these given filter coefficients are out the scope of this paper. • For linear phase FIR filter, the impulse response is symmetric, i.e., |hk| = |hN-k|. Therefore, the distinct coefficients that need to be implemented are {hi|0 ≤ i ≤ ⌈N=2⌉}. • In the proposed structure, the multiplications of fixed coefficients with the input variable are performed by two separate MCM blocks. Since there is no sharing of partial products across the two MCM blocks, overhead is introduced by splitting an MCM block into two. advantages: • High Efficiency • Area Efficient • Without undermining the accuracy , performance is Increased • FIR Filters with Wallace Tree & MCM Blocks to reduce the delay Proposed system block diagram Future enhancement • We will implement FIR Filters using Wallace adder technique • The adders will be varied such as: 1. Ripple carry adder 2. Carry look ahead adder • With different adders we can expect different delays, area measurement and power consumed . • Hence the comparison can be studied. Verilog code The key elements needed are: 1. Read only memory for storage of coefficients 2. Multiple constant multiplication block. 3. A D-register for storage 4. Multiplication of coefficients and input using shift and add technique. 5. Adding all the partial products using wallace adder tree. 6. Checking the input and output using the basic FIR filter equation Distributed arthimetic system Why MCM is better than Direct form ? MCM: Direct form: • As the name suggests the • In direct form we need to multiple coefficients give different coefficients initialized remain constant for different inputs. throughout. • Hence this consumes time • Even though the input and makes it relatively changes the coefficients harder. remain the same. • The operation is not • A MCM does not follow pipelined and it takes one distributed arithmetic coefficient at a time. method and operation is • The delay is more as well as pipelined the area consumed. • The delay is less. Read only memory • A ROM is needed to store the coefficients which remain same throughout the operation. • In the test operation I consider the coefficients as numbers starting from 1 to 16 . • Hence 16 is the number of taps. • The ROM is mainly a D register which follows the operation of D flip flop. • If reset = 1: The D register gives zero as output for all coefficients. • If reset = 0: The D register gives the value assigned as the input. ROM output The first series of output is due to Reset value being 1 The second series of output is due to Reset value being 0 MCM Block • In this operation we are taking 4 MCM blocks for 16 coefficients .Each MCM block has 4 coefficients. • The input signal is induced into the block to generate 4 partial products for the equal number of coefficients. • If the number of coefficients is odd then we take odd number in each MCM block eg: For 9 coefficients we can take 3 coefficients per MCM block • Using verilog code we generate 4 outputs per clock cycle from each Block. MCM Block structure
Z inverse blocks are the delay producing blocks
H0 - h15 are the coefficients 4 MCM blocks MCM 1 output MCM 2 output MCM output 3 MCM output 4 MCM OUTPUTS • Each MCM gives 4 outputs. • It follows add and shift algorithm hence no multiplier is involved in it. • For the ease of checking the coefficient values are taken numbers from 1 to 16 • The input is taken as 10 in binary. • Once the fair code is available we can take coefficient values from Matlab. D Register • The D registers store the intermediate results during the FIR filter operation. • The D registers are mainly D flip flops. • Their working can be described as: If reset = 1: The D register gives zero as output for all coefficients. If reset = 0: The D register gives the value assigned as the input. • The D flip flop considers the positive edge to produce output. D Register • There are four lengths used for the D registers: 1. A 8 bit register. 2. A 17 bit register 3. A 18 bit register 4. A 19 bit register
• A 20 bit register to store the output y[n].
• The register size is set based on the number of multiplication and addition process. • If a multiplication process occurs then the answer is 2n bits. • If a addition process occurs then the answer is n+1 bits. D register output Shift and Add technique • This technique is used to multiply just by shifting bits in the multiplicand depending on the multiplier bits. • The position of the 1st MSB bit is 0 and it increments by 1. • If in the multiplier any bit is 1 then by seeing its position we can left shift the multiplicand by the same number of bits. • Add all the shifted numbers. • In this way only multiplicand is added by shifting the multiplier. Example of shift and add algorithm Multiplication result of previous slide Adder tree • A adder tree is used to add the partial products from each MCM block. • Wallace adder tree is the whole framework involving multiplication and addition of the coefficients and inputs. • The wallace adder tree can be implemented using different adders and then we can check various parameters such as delay, area etc. • The output has been shown for the 1st MCM block and same is repeated for others. Output of the 1st Adder tree Future work to be done • All sub modules are ready hence final code can be generated by connecting them. • In the paper the partial products are added directly and the adder is not mentioned. • For future modification I am going to try it with different adders and compare the results. • The parameters can be found using the xilinx ISE software.