Chapter-1: Dept of ECE, PEC

Performance analysis of Wallace tree multiper with kogge stone adder by using 15-4 compressor
CHAPTER-1
INTRODUCTION
1.1 INTRODUCTION
multipers are the essential part of the digital system like Arithmetic and Logic Units,
Digital Signal Processors, etc. Usually, they prompt the performance like power, delay and
area utilization of the system. Hence there is an increasing demand for the improvement of
performance of the multiplier. The multiplier consists of 3 stages - partial products
generation, partial products reduction and addition at the last stage. The second stage
(partial product) in multiplier utilize more time and power. Various techniques were
suggested to diminish multipliers critical stages. The most popular technique is using the
compressor in the reduction stage of partial product. The compressor is simply an adder
circuit. It takes a number of equally-weighted bits, adds them, and produces some sum
signals. Compressors are commonly used with the aim of reducing and accumulating a
large number of inputs to a smaller number in a parallel manner. Their main application is
within a multiplier, where partial products have to be summed up in a large amount
concurrently. The inner structure of compressors avoids carry propagation. Either there are
not any carry signals or they do arrive at the same time of the internal values.For the
purpose of reducing the delay in the second stage, several compressors are needed. Small
sizes of compressors are useful for designing the small size multiplier.
In multiplier design, the different sizes of compressors are required depending upon
the bit size. In this paper, a scheme for delay reduction in 16bit Wallace tree multiplier
with 15:4 compressor is considered. To build 15:4 compressor, a 5:3 compressor is
considered as a basic module. AND gate is used for the generation of partial products. For
‘N’ bit multiplier ‘N2’ AND gates are needed. In the partial product reduction phase,
there are three major components namely half adder, full adder and 5-3 compressor. The
final stage of addition is done by using Kogge-Stone adder. Fig. 1 shows the structure of
16X16 multiplier. Simulation results show that the approximate multiplier with
compressor using Kogge stone adder achieves high performance while comparing to
the multipliers with compressor using
Dept of ECE, PEC Page 1

another adder like parallel adder. This paper is elaborated in following sections. Designs
of approximate 16×16 Wallace tree multiplier are detailed in section II. Brief notes and
design of 15-4 compressor, 5-3 compressor and Kogge stone adder is described
Most of the students of Electronics Engineering are exposed to Integrated Circuits

(IC's) at a very basic level, involving SSI (small scale integration) circuits like logic gates
or MSI (medium scale integration) circuits like multiplexers, parity encoders etc. But there
is a lot bigger world out there involving miniaturization at levels so great, that a micrometer
and a microsecond are literally considered huge! This is the world of VLSI - Very Large-
Scale Integration. The article aims at trying to introduce Electronics Engineering students
to the possibilities and the work involved in this field.
VLSI stands for "Very Large-Scale Integration". This is the field which involves
packing more and more logic devices into smaller and smaller areas. Thanks to VLSI,
circuits that would have taken barnfuls of space can now be put into a small space few
millimeters across! This has opened up a big opportunity to do things that were not
possible before. VLSI circuits are everywhere ... your computer, your car, your brand-new
state-of- the-art digital camera, the cell-phones, and what have you. All this involves a lot
of expertise on many fronts within the same field, which we will look at in later sections.
VLSI has been around for a long time, there is nothing new about it ... but as a side effect
of advances in the world of computers, there has been a dramatic proliferation of tools that
can be used to design VLSI circuits. Alongside, obeying Moore's law, the capability of an
IC has increased exponentially over the years, in terms of computation power, utilization
of available area, yield. The combined effect of these two advances is that people can now
put diverse functionality into the IC's, opening up new frontiers. Examples are embedded
systems, where intelligent devices are put inside everyday objects, and ubiquitous
computing where small computing devices proliferate to such an extent that even the shoes
you wear may actually do something useful like monitoring your heartbeats! These two
fields are kind a related and getting into their description can easily lead to another article.

1.2 DEALING WITH VLSI CIRCUITS

Digital VLSI circuits are predominantly CMOS based. The way normal blocks like
latches and gates are implemented is different from what students have seen so far, but the
behavior remains the same. All the miniaturization involves new things to consider. A lot
of thought has to go into actual implementations as well as design. Let us look at some of
the factors involved.
Circuit Delays. Large complicated circuits running at very high frequencies have one
big problem to tackle - the problem of delays in propagation of signals through gates and
wire even for areas a few micrometers across! The operation speed is so large that as the
delays add up, they can actually become comparable to the clock speeds.
Power. Another effect of high operation frequency is increased consumption of power. This
has two-fold effect - devices consume batteries faster, and heat dissipation increases.
Coupled with the fact that surface areas have decreased, heat poses a major threat to the
stability of the circuit itself.
Layout. Laying out the circuit components is task common to all branches of
electronics. What so special in our case is that there are many possible ways to do this;
there can be multiple layers of different materials on the same silicon, there can be
different arrangements of the smaller parts for the same component and so
on. The power dissipation and speed in a circuit present a trade-off; if we try to optimize
on one, the other is affected. The choice between the two is determined by the way we
chose the layout the circuit components. Layout can also affect the fabrication of VLSI
chips, making it either easy or difficult to implement the components on the silicon
1.3 INTRODUCTION TO VLSI

Very-large-scale integration (VLSI) is the process of creating integrated circuits by
combining thousands of transistor-based circuits into a single chip. VLSI began in the
Nineteen Seventies when tricky semiconductor and conversation technologies were being
developed. The microprocessor is a VLSI gadget. The term is now not as customary as it
once was, as chips have expanded in situation into the enormous quantities of thousands of
transistors. The first semiconductor chips held one transistor each. Subsequent advances
brought increasingly transistors, and, as a outcome, more character services or techniques

have been integrated above time. The primary integrated circuits held just a few devices,
possibly as many as ten diodes, transistors, resistors and capacitors, making it possible to
manufacture a number of logic gates on a single gadget. Now known retrospectively as
"small-scale integration" (SSI), development in system ended in instruments with hundreds
of common sense gates, recognized as large-scale integration (LSI), i.e. Present
technological know-how has inspired far previous this mark and modern day
microprocessors have a few hundreds of thousands of gates and hundreds and hundreds of
hundreds of thousands of special transistors.
As of early 2008, billion-transistor processors are commercially to be had, an
instance of which is Intel's Montecito Itanium chip. That is anticipated to grow to be more
common as semiconductor fabrication strikes from the present new release of 65 nm
strategies to the subsequent forty-five nm generations (whilst experiencing new challenges
akin to increased version throughout process corners). A different terrific example is
NVIDIA’s 280 series GPU. This microprocessor is exact in the truth that its 1.4 Billion
transistors rely, capable of a teraflop of executeance, is roughly utterly committed to logic
(Itanium's transistor rely is basically as a result of the 24MB L3 cache). Current designs,
versus the earliest gadgets, use large design automation and automatic good judgment
synthesis to put out the transistors, enabling better levels of obstacle in the resulting logic
functionality. Specific excessive-executeance common sense blocks like the SRAM phone,
nevertheless, are nonetheless designed by means of hand to make sure the absolute best
effectivity (normally by way of bending or breaking established design ideas to receive the
last little bit of executeance through buying and selling balance).
WHAT IS VLSI?
VLSI stands for "Very Large-Scale Integration". This is the field which occupy
packing additional logic devices into slighter areas.
VLSI
Simply we say Integrated circuit is many transistors on one chip.
Design/manufacturing of extremely small, complex circuitry using modified
semiconductor material
Integrated circuit (IC) may contain millions of transistors, both a few mm in size
Applications wide ranging: most electronic logic devices

ADVANTAGES OF ICS ABOVE DISCRETE COMPONENTS

While we will be able to be aware of integrated circuits, the houses of integrated
circuits- what we can and cannot appropriately put in an integrated circuit-mostly verify
the architecture of the complete approach. Built-in circuits strengthen procedure
characteristics in a couple of imperative methods. ICs have three key advantages above
digital circuits built from discrete components:
Measurement. Built-in circuits are so much smaller-both transistors and wires are
reduced in size to micrometer sizes, evaluated to the millimeter or centimeter scales of
discrete add- ons. Small size results in advantages in pace and power consumption, when
you consider that smaller components have smaller parasitic resistances, capacitances,
and inductances. Speed Signals can be switched amongst logic zero and good judgment 1
much faster within a chip than they may be able to amongst chips. Statement inside of a
chip can occur 1000's of times prior than communique amongst chips on a printed circuit
board. The excessive pace of circuits on-chip is due to their small measurement-smaller
constituent and wires have smaller parasitic capacitance to slow down the signal.
Power consumption. Common sense operations inside a chip also take so much less
power. As soon as again, curb vigor consumption is essentially due to the small
measurement of circuits on the chip-smaller parasitic capacitances and resistances require
much less vigor to pressure them.
1.4 VLSI AND SYSTEMS
These advantages of integrated circuits translate into advantages at the system level:
Smaller physical size. Smallness is frequently a benefit in itself-consider portable
televisions or handheld cellular telephones. Lower power consumption. Substitute a
handful of normal parts with a single chip decreases total power utilization. Reducing
power consumption has a ripple effect on the rest of the system: a smaller, cheaper power
supply can be used; since less power consumption means less heat, a fan may no longer be
essential; a simpler cabinet with less shielding for electromagnetic shielding may be
feasible, too.

Reduced cost. Dropping the number of components, the power supply requirements,
cabinet costs, and so on, will inevitably reduce system cost. The ripple effect of integration
is such that the cost of a system built from custom ICs can be less, even though the
individual ICs cost more than the standard parts they replace.
Understanding why integrated circuit technology has such profound influence on the
design of digital systems requires understanding both the technology of IC Manufacturing
and the economics of ICs and digital programs. The growing sophistication of applications
consistently pushes the design and manufacturing of integrated circuits and electronic
systems to new levels of problem. And maybe probably the most potent characteristic of
this assortment of programs is its sort-as techniques turn out to be more difficult, we
construct no longer a number of normal-intent computer systems however an ever-wider
variety of designated-purpose techniques. Our potential to take action is a testimony to our
developing mastery of each integrated circuit manufacturing and design, however the
growing demands of customers continue to test the limits of design and manufacturing
1.4.1 ASIC
An Application-Exact Integrated Circuit (ASIC) is an integrated circuit (IC)

Customized for a designated use, as a substitute than supposed for general-intent use. For
illustration, a chip designed exclusively to run a cell phone is an ASIC. Intermediate
amongst ASICs and enterprise general built-in circuits, like the 7400 or the 4000 series,
are utility detailed normal merchandise (ASSPs).
As characteristic sizes have gotten smaller and design tools improved above the
years, the highest trouble (and as a consequence performance) feasible in an ASIC has
grown from 5,000 gates to above one hundred million. Cutting-edge ASICs almost always
include entire 32-bit processors, memory blocks in conjunction with ROM, RAM,
EEPROM, Flash and different big building blocks. Such an ASIC is traditionally termed a
SoC (procedure-on-a-chip). Designers of digital ASICs use a hardware description
language (HDL), corresponding to Verilog or VHDL, to describe the functionality of
ASICs.

Field-programmable gate arrays (FPGA) are the present day-day technological know-
how for building a breadboard or prototype from common materials; programmable good
judgment blocks and programmable interconnects permit the identical FPGA to be used in
many one-of-a-kind applications. For smaller designs and/or lower production volumes,
FPGAs may be more cost effective than an ASIC design even in production.
An application-exact integrated circuit (ASIC) is an integrated circuit (IC) customized for
a particular use, rather than intended for general-purpose use.A Structured ASIC falls
among an FPGA and a Standard Cell-based ASIC Structured ASIC’s are used mainly for
mid-volume level designs The design task for structured ASIC’s is to map the circuit into
a fixed arrangement of known cells.
Among different arithmetic blocks, the multiplier is one of the main blocks, which is
widely used in different applications especially signal processing applications. There are
two general architectures for the multipliers, which are sequential and parallel. While
sequential architectures are low power, their latency is very large. On the other hand,
parallel architectures (such as Wallace tree and Dadda) are fast while having high-power
consumptions. The parallel multipliers are used in high-performance applications where
their large power consumptions may create hot-spot locations on the die. Since the power
consumption and speed are critical parameters in the design of digital circuits, the
optimizations of these parameters for multipliers become critically important. Very often,
the optimization of one parameter is performed considering a constraint for the other
parameter. Specifically, achieving the desired performance (speed) considering the limited
power budget of portable systems is challenging task. In addition, having a given level of
reliability may be another obstacle in reaching the system target performance.
To meet the power and speed specifications, a variety of methods at different design
abstraction levels have been suggested. Approximate computing approaches are based on
achieving the target specifications at the cost of reducing the computation accuracy. The
approach may be used for applications where there is not a unique answer and/or a set of
answers near the accurate result can be considered acceptable. These applications include
multimedia processing, machine learning, signal processing, and other error resilient
computations. Approximate arithmetic units are mainly based on the simplification of the

arithmetic units’ circuits. There are many prior works focusing on approximate multipliers
which provide higher speeds and lower power consumptions at the cost of lower
accuracies. Almost, all of the proposed approximate multipliers are based on having a fixed
level of accuracy during the runtime. The runtime accuracy re configurability, however, is
considered as a useful feature for providing different levels of quality of service during the
system operation. Here, by reducing the quality (accuracy), the delay and/or power
consumption of the unit may be reduced. In addition, some digital systems, such as general-
purpose processors, may be utilized for both approximate and exact computation modes.
An approach for achieving this feature is to use an approximate unit along with a
corresponding correction unit. The correction unit, however, increases the delay, power,
and area overhead of the circuit. Also, the error correction procedure may require more
than one clock cycle, which could, in turn, slowdown the processing further.
In this paper, we present four dual-quality reconfigurable approximate 4:2
compressors, which provide the ability of switching between the exact and approximate
operating modes during the runtime. The compressors may be utilized in the architectures
of dynamic quality configurable parallel multipliers. The basic structures of the proposed
compressors consist of two parts of approximate and supplementary. Inthe approximate
mode, only the approximate part is active whereas in the exact operating mode, the
supplementary part along with some components of the approximate part is invoked.
APPROXIMATE COMPUTING
"The need for approximate computing is driven by two factors: a fundamental shift in
the nature of computing workloads, and the need for new sources of efficiency," said, a
Purdue Professor of Electrical and Computer Engineering, who has been working in the
field for about five years. "Computers were first designed to be precise calculators that
solved problems where they were expected to produce an exact numerical value.
However, the demand for computing today is driven by very different applications.
Mobile and embedded devices need to process richer media, and are getting smarter –
understanding us, being more context-aware and having more natural user interfaces. On
the other hand, there is an explosion in digital data searched, interpreted, and mined by
data centers."

A growing number of applications are designed to tolerate "noisy" real-world inputs

and use statistical or probabilistic types of computations.
"The nature of these computations is different from the traditional computations where
you need a precise answer," said SrimatChakradhar, department head for Computing
Systems Architecture at NEC Laboratories America, who collaborated with the Purdue
team. "Here, you are looking for the best match since there is no golden answer, or you
are trying to provide results that are of acceptable quality, but you are not trying to be
perfect."
However, today's computers are designed to compute precise results even when it is
not necessary. Approximate computing could endow computers with a capability similar
to the human brain's ability to scale the degree of accuracy needed for a given task. New
findings were detailed in research presented during the IEEE/ACM International
Symposium on Microarchitecture, Dec. 7-11 at the University of California, Davis.
The inability to perform to the required level of accuracy is inherently inefficient and
saps energy.
"If I asked you to divide 500 by 21 and I asked you whether the answer is greater than
one, you would say yes right away," Raghunathan said. "You are doing division but not
to the full accuracy. If I asked you whether it is greater than 30, you would probably take
a little longer, but if I ask you if it's greater than 23, you might have to think even harder.
The application context dictates different levels of effort, and humans are capable of this
scalable approach, but computer software and hardware are not like that. They often
compute to the same level of accuracy all the time."
Purdue researchers have developed a range of hardware techniques to demonstrate

approximate computing, showing a potential for improvements in energy efficiency.
The research paper presented during the IEEE/ACM International Symposium on

Microarchitecture was authored by doctoral student SwagathVenkataramani; former
Purdue doctoral student Vinay K. Chippa; Chakradhar; Kaushik Roy, Purdue's Edward G.
Tiedemann Jr. Distinguished Professor of Electrical and Computer Engineering; and
Raghunathan.

Recently, the researchers have shown how to apply approximate computing to

programmable processors, which are ubiquitous in computers, servers and consumer
electronics.
"In order to have a broad impact we need to be able to apply this technology to
programmable processors," Roy said. "And now we have shown how to design a
programmable processor to perform approximate computing."
The researchers achieved this milestone by altering the "instruction set," which is the
interface between software and hardware. "Quality fields" added to the instruction set allow
the software to tell the hardware the level of accuracy needed for a given task. They have
created a prototype programmable processor called Quora based on this approach.
"You are able to program for quality, and that's the real hallmark of this work," lead
author Venkataramani said. "The hardware can use the quality fields and perform energy
efficient computing, and what we have seen is that we can easily double energy
efficiency."In other recent work, led by Chippa, the Purdue team fabricated an
approximate "accelerator" for recognition and data mining."We have an actual hardware
platform, a silicon chip that we've had fabricated, which is an approximate processor for
recognition and data mining," Raghunathan said. "Approximate computing is far closer to
reality than we thought even a few years ago."
Approximate computing leverages the intrinsic resilience of applications to
inexactness in their computations, to achieve a desirable trade-off between efficiency
(performance or energy) and acceptable quality of results. To broaden the applicability of
approximate computing, we propose quality programmable processors, in which the
notion of quality is explicitly codified in the HW/SW interface, i.e., the instruction set.
The ISA of a quality programmable processor contains instructions associated with
quality fields to specify the accuracy level that must be met during their execution. We
show that this ability to control the accuracy of instruction execution greatly enhances the
scope of approximate computing, allowing it to be applied to larger parts of programs.
The micro-architecture of a quality programmable processor contains hardware
mechanisms that translate the instruction-level quality specifications into energy savings.
Additionally, it may expose the

actual error incurred during the execution of each instruction (which may be less than the
specified limit) back to software. As a first embodiment of quality programmable
processors, we present the design of Quora, an energy efficient, quality programmable
vector processor. Quora utilizes a 3-tiered hierarchy of processing elements that provide
distinctly different energy vs. quality trade-off, and uses hardware mechanisms based on
precision scaling with error monitoring and compensation to facilitate quality
programmable execution. We evaluate an implementation of Quora with 289 processing
elements in 45nm technology. The results demonstrate that leveraging quality-
programmability leads to 1.05X-1.7X savings in energy for virtually no loss (< 0.5%) in
application output quality, and 1.18X-2.1X energy savings for modest impact (<2.5%) on
output quality. Our work suggests that quality programmable processors are a significant
step towards bringing approximate computing to the mainstream.

CHAPTER-2
LITERATURE SERVEY
2.1 LITERATURE SERVEY

Mahesh, R. and Vinod, A.P. “New Reconfigurable Architectures for Implementing
FIR filters with Low Complexity” IEEE Transaction of CAD of Integrated Circuits and
Systems, Vol. 29, No.2, 2010. In this paper, two reconfigurable architectures for low
complexity FIR filters are proposed namely, Constant Shift Method and Programmable
Shift Method. The proposed FIR filter architecture is capable of operating for different
word length filter coefficients without any overhead in the hardware circuitry. They show
that dynamically reconfigurable filters can be efficiently implemented by using common
sub expression elimination algorithms. Design examples show that the proposed
architectures offer good area and power reductions and speed improvement compared to
the best existing reconfigurable FIR filter implementations in the literature.
Sammy, P. et al. “A Programmable FIR filter for TV ghost Cancellation” IEEE

Transaction on 1997. In this paper, a compact 64-tap programmable FIR filter suitable for
TV ghost cancellation has been designed by using Carry Save-Add Shift (CSAS)
multiplier to achieve area efficiency and an internally generated self-timed clock to
achieve timing efficiency. The prototype chip is implemented in a die area of 12.6 mm2
using a 0.8-pm CMOS process and can operate at up to 18 MHz with a 5-V supply or r
14.32 MHz with a 3.6-V supply. It has a 10-bit input word length, a 14-bit output word
length, and an 18-bit internal word length. The chip is suitable for canceling “short” ghosts
such as those present in a cable system, or it can be cascaded to form longer filters for
canceling broadcast TV ghosts.
Ababneh J.I. and Bataineh, M.H. (2007) Linear Phase FIR filter design using particle
swarm optimization and genetic algorithms. Digital Signal Processing. 34 doi:
10.1016/j.dsp.2007.05.011. Presents a high performance and low power FIR filter design,
which is based on computation sharing multiplier (CSHM). CSHM specifically targeted
computation re-use in vector-scalar products and was effectively used in the suggested FIR

filter design. Efficient circuit level techniques namely a new carry select adder and
Conditional Capture Flip-Flop (CCFF), were also used to further improve power and
performance. The suggested FIR filter architecture was implemented in 0.25 pm
technology. Experimental results on a 10-tap low pass CSHM FIR filter showed speed and
power improvement of 19% and 17%, respectively
Bruce, H., et al. “Power Optimization of Reconfigurable FIR filter” IEEE Transaction
on 2004. This paper describes power optimization techniques applied to a reconfigurable
digital finite impulse response (FIR) filter used in a Universal Mobile Telephone Service
(UMTS) mobile terminal. Various methods of optimization for implementation were
combined to achieve low cost in terms of power consumption. Each optimization method
is described in detail and is applied to the reconfigurable filter. The optimization methods
have achieved a 78.8 % reduction in complexity for the multipliers in the FIR structure.
An automated method for transformation of coefficient multipliers into bit-shifts is also
presented.
Süleyman, S.D. and Andrew G.D. “Efficient Implementation of Digital filters using
novel Reconfigurable Multiplier Blocks” IEEE Transaction on 2004. Generally
Reconfigurable Multiplier Block (ReMB) offer significant complexity reductions in
multiple constant multiplications in time multiplexed digital filters. The ReMB technique
was employed in the implementation of a half-band 32-tap FIR filter on both Xilinx
Virtex FPGA and UMC 0.18μm CMOS technologies. Reference designs had also been
built by deploying standard time-multiplexed architectures and off-the-shelf Xilinx Core
Generator system for the FPGA design. All designs were then compared for their area
and delay figures. It was shown that, the ReMB technique can significantly reduce the
area for the multiplier circuitry and the coefficient store, as well as reducing the delay.
2.2 EXISTING SYSTEM

In this section, four designs of a 5-3 approximate compressor are presented. 5-3
compressor has five primary inputs (X0, X1, X2, X3, and X4) and three outputs (O0, O1,
and O2). This compressor uses the counter property. Output of the compressor depends
on number of ones present at input. This proposed compressor also called as 5-3 counter.
In this paper, we have called this module as a compressor because this module
compresses five bits into
three bits. We have chosen 5-3 compressor because it is a basic module for 15-4
compressor. Error rate and error distance of each design are considered. Design 1 In this
design, initially output O2 of 5-3 compressor is approximated. Logical AND between
inputs X3 and X2 matches with accurate output O2 of the conventional 5-3 compressor
with an error rate of 18.75%. The following expressions show design 1 of 5-3 approximate
compressor.
figure 2.1 initially output o2of 5-3 compressor is approximated
DESIGN 2
In this design, O2, O1 are approximated and O0 is kept as the same as original expression.
Error distance of all the error cases is either -2 or +2. From the truth table, it can be noted

that pass rate of O2 is 87.5% when O2 alone is replaced with O2 in a 5-3 compressor.
Similarly, pass rate of O1 is 75% when compared with the O1 output of the 5-3 compressor.
Expression for O2 and O1 are modified to get the minimum error distance. The overall
pass rate of this design is 75%. The output of the compressor differs only in eight input
cases. In this design, the critical path is between input X0 and output O0. Four XOR gates
are involved in the critical path. This design has least critical path than other proposed
designs.
figure 2.2: logic diagram of design 2 approximate 5-3 compressor.
2.3 PROPOSED SYSTEM

The major role of electronics device is to provide low power dissipation and compact area
with high speed performance. Among the major modules in digital building blocks system,
multiplier is the most complex one and main source of power dissipation. Approximate
Computing to multiplier design plays major role in electronic applications, like multimedia
by providing fastest result even though it possesses low reliability. In this paper, a design
approach of 16bit Wallace Tree approximate multiplier with 15-4 compressor is considered
to provide more reliability. The 16×16 Wallace tree multiplier is synthesized and simulated

using Xilinx ISE 14.5 software. The multiplier occupies about 15% of total coverage area.
The dissipated power and delay of the multiplier are 0.042μw, 3.125ns respectively.
2.4 OBJECTIVE
Most of the students of Electronics Engineering are exposed to Integrated Circuits
(IC's) at a very basic level, involving SSI (small scale integration) circuits like logic gates
or MSI (medium scale integration) circuits like multiplexers, parity encoders etc. But there
is a lot bigger world out there involving miniaturization at levels so great, that a micrometer
and a microsecond are literally considered huge! This is the world of VLSI - Very Large-
Scale Integration. The article aims at trying to introduce Electronics Engineering students
to the possibilities and the work involved in this field.
2.5 THEORY OF PROJECT
To implement the project our goal is to increase the computing efficiency and to
decrease the power consumption by introducing the kogge stone adder and Wallance tree
multiper
2.6 REQUIRED COMPONENTS
• Xilinx ISE 14.5 software
• Wallace tree multiper
• Kogge stone adder
• 15-4 compressor
• Power supply
• display
2.7 WALLACE TREE MULTIPLIER
Multiplier is the substantive part of the electronic device and decides the overall
performance of the system. When designing a multiplier, huge amount of power and delay
are generated. To minimize these disadvantages, adders and compressor are used. Hence
reducing delay in multiplier has been a main aim to enhance the performance of the digital
systems like DSP processors. Hence many attempts are done on multipliers to make it
faster. It is an effective hardware realization of digital system that is nothing but a Wallace
tree which multiplies two numbers and minimizes the number of partial products. In vector
processors, several multiplications are performed to obtain data orloop level parallelism.

High processing speed and low power consumption are the major advantages of this
multiplier.
figure 2.3. structure of 16×16 multiplier using 15-4 compressor
The three stages of Wallace tree multiplier are mentioned below:

1) Partial products generation
2) Partial products reduction
3) Addition at the final stage

figure 2.4. schematic view of 16×16-bit Wallace tree multiplier
Fig. 3 and Fig. 4 describes the structure and schematic view of 16bit multiplier using with
the help of 15-4 compressor. Here in this design each dot denotes partial product. From
13th column onwards, 15-4 compressors are used in this multiplier architecture. Column
number 13 consist of 13 partial products, in order to get 15 partial products 2 zeros are
added. Similarly, in 14th column, one zero is added. Approximate compressors are used in
13th, 14th and15th column of multipliers. The partial product reduction phase consists of
half adder, full adder and 5:3 compressors. When the numbers of bits in the column are 2
and 3 half adders and full adders are used in each column. In case of a single bit, it is moved
further to the subsequent level of that particular column without any need for further
processing. Until only two rows will remain, this reduction process is repeated. Finally,
summation of the last two rows is achieved using 4-bit Kogge-Stone adder.
2.8 15-4 COMPRESSOR

A compressor is simply an adder circuit. It takes a number of equally-weighted
bits, adds them, and produces some sum signals. Compressors are commonly used with
the aim of reducing and accumulating a large number of inputs to a smaller number in a
parallel manner. They are the important parts of the multiplier design as they highly
influence the speed of the multiplier. Their main application is within a multiplier, where a
huge number of partial products have to be summed up concurrently. For high speed
applications.
DSP, image processing needs several compressors to perform arithmetic operation. A

compressor adder provides reduced delay over conventional adders using both half adders
and full adders. Here the representation as ‘N-r’, in which ’N’ denotes as the number of
bits and ‘r’ denotes as the total number of 1’s present in ‘N’ inputs. The compressor reduces
the number of gates and the delay with reference to other adder circuits. The inner structure
of compressors avoids carry propagation. Either there are not any carry signals or they do
arrive at the same time of the internal values. Compressors are widely used in the reduction
stage of a multiplier to accumulate partial products in a concurrent manner. In this part it
is considered the design of 15-4 compressor by using with approximate 5-3 compressors
[5]. This compressor compresses 15 inputs (C0- C14) into 4 outputs (B0-B3). The 15-4
compressor consists of three phases. The first phase has five full adders, the second phase
uses two 5-3 compressors and finally the 4-bit kogge stone adder. In this compressor
design, approximate 5-3 compressor are preferred over accurate 5-3 compressors as shown
in the Fig. 5
Figure 2.5. Logic Diagram of Approximate 15-4 Compressor
A. 5-3 COMPRESSOR
The 15-4 compressor consists of 5-3 compressor as a basic design. The 5-3
compressor utilizes five primary inputs namely A0, A1, A2, A3, A4 and produces three
outputs namely
B0, B1, B2. In this compressor, the presence of number of 1’s at the input decides the
output of compressor and also uses counter property.
Figure 2.6. Logic diagram of 5-3 compressor
The design of compression of given 5 inputs into 3 output is called the design of 5-3
compressor. Error rate of 5-3 compressor is considered. The design equations of 5-3
approximate compressor are shown in following equations respectively. The logic diagram
of approximate 5-3 compressor is as shown in Fig. 6.
B. KOGGE STONE ADDER
In 1973, Peter M. Kogge and Harold S. Stone introduced the concept of efficient
and high- performance adder called kogge-stone adder. It is basically a parallel prefix
adder. This type of adder has the specialty of fastest addition based on design time. It is
known for its special and fastest addition based on design time [9], [10].

functional block diagram and RTL view of a 4-bit KoggeStone Adder is shown.
By using the ith bit of given input, the propagate signals ‘Pi’ and generate signals ‘Gi’ are
calculated. Similarly, these generated signals produce output carry signals. Therefore, by
minimizing the computation delay, Prefix Adders are mainly classified into 3 categories.
A. Pre- processing
B. Generation of Carry.
C. Final processing.
A. Pre-Processing
In this stage, the generate and propagate signals are given by the equations 5&6.
B. Generation of carry
In this stage, carries are calculated with their corresponding bits and this operation is
executed in parallel manner. Carry propagation and generation are used as intermediate
signals. The logic equations for carry propagate and generate are shown below.
C. Final Processing
In final processing, the sum and carry outputs bits are computed for the given input bits
and the logic equation for the final processing stage is given by

Figure 2.7. Block diagram of kogge stone adder
Figure 2.8. RTL view of kogge-stone adder

CHAPTER-3
DESIGN AND IMPLEMENTATIONS
3.1 XILINX ISE
XILINX 14.5
Xilinx software is required for both VHDL and VERILOG designers to perform
synthesis operation. Any simulated code is synthesized and configured on FPGA. The
process of changing VHDL code into gate level net list is called synthesis. It is the main
part of current design flows.
Figure 3.1. Creation of a new project

3.1.1 ALGORITHM OF XILINX
CLICK ON XILINX ISE ICON TO START THE XILINX
• To create New Project. The below Figure 3.1 shows the new project creation, where
the name is to be given and should select the corresponding location and shows the
same directory as the location of project.
• The following properties are displayed and set the properties according to our
requirement.
Figure 3.2 shows the settings of project, where the device and design flow of project
is to be selected. Set the category of product as all, family as Spartan 3e, and device is

xc3s100e and the language preferred is Verilog. After the properties are selected, then it
clicks next.
Figure 3.2. Project settings

Select the Verilog Source by giving the required inputs, outputs and buffers, and a
window is displayed to write the verilog code and is synthesized.
Figure 3.3. Creation of a new source

The Figure3.3 shows creating a new source, where select the project menu and then
select new source. Therefore, the new source is created depends on given conditions and
requirements.

• Select source type as Verilog module
Figure 3.4 Selection of type of source

Figure 3.4 shows type of source selection where select source type as verilog
module and write file name and it gives location of filename in the particular drive. Select
add to project and click next.
Figure 3.5. Summary of new source

Figure 3.5 shows the summary of source where it displays source type and name and
module name. It also displays the port definitions. Then click finish. It displays the editor
where the program is written and then save it.

• When the Verilog code is completed the check for syntax errors.
• Click on RTL schematic and click on technology and after that go for the synthesis
report.
Figure 3.6: Checking for the syntax

Figure 3.6 shows the syntax errors, where select project source file and in the
process window, select synthesis XST and it displays the properties. In this select the
syntax. By right clicking on the check syntax, select the run option so it displays the errors
in the program in console window.
CORRECTING HDL ERRORS
The syntax of added files and the files which are saved to project can be verified.
Console displays error messages and Parser Messages indicate the success or failure when
each file is parsed. If any of the modules contains syntax error correct it before further
proceeding. An “ERROR” in Console indicates the failure and line number of line where
the syntax problem has occurred. For displaying errors required steps are-
• When we click the file name in the error message of Console Panel or error panel,
the source code will be opened in the Workspace. The line with the error is indicated
by a yellow arrow icon next to the line.

• Then one needs to correct any errors in the HDL source file. The comments which
are placed above error help to fix the errors.
• After correcting the errors go to File menu and then press Save to save the file.
• The parsing message should then indicate that file is error free and should display
that file is checked successfully.
• Verilog synthesis tools could create logical circuit structures directly from Verilog
behavioral description and target them to a selected technology for realization (i.e.,
convert Verilog to actual hardware).
• By using Verilog, design, simulation and synthesis are performed by a simple
combinational circuit to complete microprocessor on chip.
• Verilog HDL is a standard hardware description language. Verilog HDL is having
many useful features for hardware design.
• Verilog HDL is a general-purpose hardware description language, which is to learn
and use easily. The syntax for Verilog is same as C programming language.
Designer says that who has experience with C programming they can easily learn
Verilog HDL.
• It allows different levels of modeling mixed in the same model. Hence, switches,
gates, RTL, or behavioral code of modeling hardware are defined by designer. Also,
designer learns easily one language for incentive and hierarchical design.
• Verilog HDL supports the popular logic synthesis. This makes the designers can
choose any language.
• The Programming Language Interface (PLI) is feature in which C code is written
to interact with Verilog data structures.
3.1.2 VERILOG HDL
Verilog HDL is a hardware description language that can model digital system at
many abstract levels ranging from the algorithmic level to the gate level and to the switch
level. The modeling of the digital system difficulties changes from simple gate to complete
digital electronic system. The digital system is described hierarchically and timing can be
highly modeled within the same description.
The Verilog language includes the behavioral, the dataflow and structural model,
delays and mechanism of generating waveform including monitoring response and

verification is modeled by one single language. In addition to this, the internal design is
accessed and simulation run is controlled by simulation because language has
programming language interface.
This language defines not only the syntax but also gives
simulation semantics construct for each language. Therefore, models written in this
language can be verified by Verilog simulator. The language inherits many of its operator
symbols and obtained from C programming language. This provides large capabilities of
modeling, some of them are difficult to understand initially. Hence, a core language is easy
to learn and use. This is sufficient to model many applications.
3.1.3 VERILOG CAPABILITIES
The following are the major capabilities of the Verilog:
• Primitive logic gates like AND, OR and NAND, are built in the language.
• Creating a user-defined primitive (UDP) is flexible. Such primitive is nothing but

either combinational logic or a sequential logic.
• PMOS and NMOS are the gates for switch-level modeling and also in built in the
language.
• Simple language constructs are required to specify pin-to-pin & path delays and
timing verification of a design.
• A design is modeled in three different styles or in a mixed style. These are nothing
but: behavioral model-is modeled by procedures; dataflow model - continuous
assignments are modeled; and structural model – modeled using gate and
instantaneous of module.
• kinds of data types present in Verilog HDL; net and register data type. The physical
connections between structured elements represent net type and data storage
elements represent register type.
• The capacity of Verilog in mixed-level modeling in one design; different level of
modeling is done at each module. In this, different level of modeling is formed such
as switch level, gate level, algorithm level and RTL level of modeling.
• Verilog HDL also consists of logical built-in functions such as & (bitwise-and) and
I (bitwise-or).

• High-level language constructs such as condition- else, case statements, and loops
are available in the language.
• The idea of timing and concurrency are unambiguously modeled.
• Capability of reading and writing powerful files.
Figure 3.7. Mixed-Level Modeling

• It is in deterministic language in some cases, because different simulators produce
different results in a model; for eg., the ordering of events on queue is not defined
by the standard.
3.1.4 SYNTHESIS
It is the process of building gate level from register-transfer level circuit model
explained in Verilog HDL. This system is an intermediate step to produce a netlist
comprising of register-transfer level blocks like flip flops, arithmetic &logical units, and
multiplexers, which is interconnected with wires. In this case, the second program is called
the RTL module, which is necessary. The purpose of this is to acquire the predefined
components from a library and each RTL block is used in the user-specified target
technology.
Verilog HDL consists of synthesis and RTL module, where the parameters such as
power consumption, delay, area and the usage of memory are found. RTL module gives
the project overview in the form of figure.
Having produced gate level netlist, logic optimizer reads the netlist and reduces the
circuit sis satisfied for specified area and timing constraints. These parameters may also be
used by the module builder for appropriate selection or generation of RTL blocks. In this,
we assume that the target net list is at the gate level. The logic gates are used.

Figure 3.8. Synthesis Process

Figure 3.8 shows elements in Verilog HDL and the elements used in hardware. The
elements of Verilog are converted in to hardware elements by using a mechanism called
mapping or construction mechanism.
Write the program in the Verilog language and check for syntax errors and then
modify the errors. Then verify the design, after that go for the synthesis where the RTL
and technology schematic are to be known.
Figure 3.9. Typical Design Process

3.1.5 XILINX ISE DESIGN TOOLS

• Xilinx Seis the design tool provides by Xilinx. Xilinx would
virtuallyidenticalforour purposes.
• Theyare4 fundamental steps in every digital logic design.
• Design – Theschematicorcodecan describe the circuit.
• Synthesis – The intermediate alteration ofhumanreadable circuit portrayal to FPGA

code format. It includessyntaxcheckandcombineof all thedividedesign files into
asingle file.
• Place&Route– Wherethelayoutofthe circuit is finalized. This is the translation
oftheFPGAcodeintologicgates on theFPGA.
• Program – TheFPGA efficient to reflect the devise through theuseof programming
(.bit) files.
• Testbench simulation is in the second step. ISE has thecapabilityto do
adiversityofunlike design methodologies including: FiniteStateMachine, Schematic
Capture, and HDL(VHDLorVerilog).
MODELSIM
MODELSIM S E - HIGH PRESENTATION SIMULATION AND DEBUG
ModelSim SE is, Linux, UNIX and Windows-based simulation and debugs
environment combine to produce soaring performance with the mainly powerful and
intuitive GUI in trade.
WHAT'S NEW IN MODELSIM SE?
• Enhanced FSM debug options including run of basic information, conversion table and
counsel messages. Added carry of FSM Multi-state transitions coverage
• Enhanced debugging using hyperlinked navigation linking objects and own
declaration, and involving visited source files.
• The dataflow window can compute and shown all paths from one net to another.
• Enhanced code reporting data management with well grain manages of information in
the source window.

• Some IEEE VHDL 2009 features were support and including source code encryption.
Additional support of novel VPI types, together with packed arrays of struck variables
andnets.
MODELSIM SE FEATURES
• Multi language, soaring presentation simulation engine
• Verilog, VHDL, System Verilog Design
• Code Coverage
• To design system Verilog
• incorporated debug
• Job Spy Regression Monitor
• Mixed HDL simulation option
• SystemC Option
• Solaris and Linux 32-bit and 64-bit
• Windows 32-bit
MODELSIM SE BENEFITS
• Soaring performance HDL simulation solution forASIC& FPGA design teams
• The best mixed-language environment and performance in the industry
• Intuitive GUI for efficient interactive or post-simulation debug of RTL and gate-level
designs
• Amalgamation reporting and, ranking of code exposure for tracking verification
evolution
• Sign-off sustain for trendy ASIC libraries
• All ModelSim yield 100% values based.
• Award-winning scientific support

CHAPTER-4
SIMULATION RESULTS
4.1 SIMULATION RESULTS
The design of approximate16 bit Wallace multiplier using 15-4 compressor has been
done in HDL, using Xilinx ISE 14.5. Simulation results show the design of overall
architecture of Wallace tree multiplier as shown in Fig. 9. The parameters of area are
utilized by the multiplier design and power consumption are obtained through simulation
and tabulated in Table I and Table II. The snapshot of delay obtained through simulation
is shown in Fig.
10. The processing delay at the end of addition level can be reduced by using kogge stone
adder
Figure 4.1 Overall Architecture Of 16×16 Bit Wallace Tree Multiplier

Table I and II describes the area utilization and power parameters of a 16-bit Wallace
multiplier. It shows better result than other adder apart from that it gives less area and low
propagation delay.
table 4.1: device utilization of 16×16 Wallace tree multiplier
table 4.2: power analysis of 16×16 Wallace tree multiplier
Figure 4. 2. Delay analysis of 16×16bit Wallace tree multiplier

table 4.3: design analysis of multiplier
The performance of 15-4 compressor based approximate 16×16 multiplier is also

compared with various adders at the final stage instead of Kogge stone adder. The
comparative results in terms of reduced delay, area and power dissipation are tabulated
in Table III.

FUTURE SCOPE
In future the performance of the proposed multiplier can be improved and applied in
applications like video and image processing.

CONCLUSION
The approximate 16×16bit Wallace tree multiplier using 15- 4 compressor architecture
has been designed and synthesized using on Spartan 3 XC3S100E board and simulated in
Xilinx ISE 14.5. The performance of proposed Multiplier with kogge stone adder is
compared with the same architecture of multiplier using parallel adder. It can be inferred
that 16×16 multiplier architecture using 15-4 compressor with kogge stone adder is faster
compared to multiplier with parallel adder. In future the performance of the proposed
multiplier can be improved and applied in applications like video and image processing.

REFERENCES
[1] C. S. Wallace, A Suggestion for a Fast Multiplier, IEEE Transactions on Computers,
13, 1964,14-17.
[2] K. Bhardwaj, P. S. Mane, and J. Henkel, ‘‘Power-and area-efficient approximate

Wallace tree multiplier for error-resilient systems,’’in Proc.15thInt. Symp. Quality
Electron. Design (ISQED), Mar. 2014, pp. 263–269.
[3] C.-H. Lin and I.-C. Lin, ‘‘High accuracy approximate multiplier with error correction,’’
in Proc. IEEE 31st Int. Conf. Comput. Design (ICCD), Oct. 2013, pp. 33–38
[4] D. R. Gandhi, and N. N. Shah, Comparative Analysis for Hardware Circuit

Architecture of Wallace Tree Multiplier, IEEE International Conference on Intelligent
Systems and Signal Processing, Gujarat, 2013, 1-6.
[5] R. Menon and D. Radhakrishnan, ‘‘High performance 5:2 compressor architectures,’’

Proc. IEE-Circuits, Devices Syst., vol. 153, no. 5, pp. 447–452, Oct. 2006
[6] R. Marimuthu, M. Pradeepkumar, D. Bansal, S. Balamurugan, and P. S. Mallick,

‘‘Design of high speed and low power 15-4 compressor,’’ in Proc. Int. Conf. Commun.
Signal Process. (ICCSP), Apr. 2013, pp. 533– 536.
[7] Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and analysis of approximate
compressors for multiplication,” IEEE Trans. Comput., vol. 64, no. 4, pp. 984–994, Apr.
2015. [8] Teffi Francis, Tera Joseph and Jobin K Antony., “Modified MAC Unit for low
power high speed DSP application using multiplier with bypassing technique and
optimized adders”, IEEE-31661, 4th ICCCNT, 2013.
[9] Yezerla, Sudheer Kumar, and B. Rajendra Naik. "Design and Estimation of delay,
power and area for Parallel prefix adders." In Engineering and Computational Sciences
(RAECS), 2014 Recent Advances in, pp. 1-6. IEEE, 2014.
[10] Y. Choi, “Parallel Prefix Adder Design” Proc. 17th IEEE Symposium on Computer
Arithmetic, pp 90-98, 27th June 2005. [11] Belle W. Y. Wei and Clark D. Thompson,

“Area-Time Optimal Adder Design”, IEEE transactions on Computers, vol.39, pp. 666-
675, May1990.

APPINDEX-A
module test_TRgate;
// Inputs
reg a;
reg b;
reg c;
// Outputs
wire p;
wire q;
wire r;
// Instantiate the Unit Under Test (UUT)
TR_gate uut (
.a(a),
.b(b),
.c(c),
.p(p),
.q(q),
.r(r)
);
initial begin
// Initialize Inputs
a = 0;
b = 0;
c = 0;
#100
a = 0;

b = 1;
c = 0;
#100
a = 0;
b = 1;
c = 1;
#100
a = 1;
b = 0;
c = 1;
// Wait 100 ns for global reset to finish
#100;
module test_tiffoli;
// Inputs
reg a;
reg b;
reg c;
// Outputs
wire p;
wire q;
wire r;
toffolig uut (
.a(a),

.b(b),
.c(c),
.p(p),
.q(q),
.r(r)
);
initial begin
a = 0;
b = 0;
c = 0;
#100
a = 0;
b = 1;
c = 0;
#100
a = 1;
b = 1;
c = 0;
#100
a = 1;
b = 1;
c = 1;
#100;

// Add stimulus here
End
end module
module test_peres;
// Inputs
reg A;
reg B;
reg C;
// Outputs
wire P1;
wire P2;
wire P3;
Peresgate uut (
.A(A),
.B(B),
.C(C),
.P1(P1),
.P2(P2),
.P3(P3)
);
initial begin
A = 0;
B = 0;
C = 0;

#100
A = 0;
B = 0;
C = 1;
#100
A = 0;
B = 1;
C = 0;
#100;
end
end module
module test_HNGate;
// Inputs
reg A;
reg B;
reg C;
reg D;
// Outputs
wire H1;
wire H2;
wire H3;
wire H4;

HNgate uut (
.A(A),
.B(B),
.C(C),
.D(D),
.H1(H1),
.H2(H2),
.H3(H3),
.H4(H4)
);
initial begin
A = 0;
B = 0;
C = 0;
D = 0;
#100
A = 1;
B = 0;
C = 1;
D = 0;
#100
A = 0;
B = 1;
C = 1;
D = 0;

#100
A = 1;
B = 0;
C = 1;
D = 1;
#100
A = 0;
B = 1;
C = 0;
D = 1;
#100
A = 0;
B = 1;
C = 1;
D = 1;
#100
#100;
end
end module
module test_Fredkingate;
// Inputs
reg A;
reg B;
reg C;

// Outputs
wire Fr1;
wire Fr2;
wire Fr3;
Fredkin gate uut (
. A(A),
. B(B),
.C(C),
. Fr1(Fr1),
. Fr2(Fr2),
. Fr3(Fr3)
);
initial begin
A = 0;
B = 0;
C = 0;
#100
A = 0;
B = 0;
C = 0;
#100
A = 0;
B = 0;
C = 0;

#100
A = 0;
B = 0;
C = 0;
#100
A = 0;
B = 0;
C = 0;
#100;
end
module test_feynman_gate;
// Inputs
reg A;
reg B;
// Outputs
wire F1;
wire F2;
Feynman gate uut (
.A(A),
.B(B),
.F1(F1),
.F2(F2)

);
initial begin
A = 0;
B = 0;
#100
A = 0;
B = 1;
#100
A = 1;
B = 0;
#100
A = 1;
B = 1;
#100;
end
endmodule
module test_feynman_double;
// Inputs
reg A;
reg B;
reg C;

// Outputs
wire F1;
wire F2;
wire F3;
feynman_double uut (
.A(A),
.B(B),
.C(C),
.F1(F1),
.F2(F2),
.F3(F3)
);
initial begin
A = 0;
B = 0;
C = 0;
#100
A = 0;
B = 1;
C = 1;
#100
A = 1;
B = 0;
C = 1;

#100
A = 0;
B = 1;
C = 0;
#100;
end
endmodule
module HNgate(input A,B,C,D, output H1, H2, H3,H4
);
assign H1 = A;
assign H2 = B;
assign H3 = A ^ B ^ C;
assign H4 = ((A ^ B)&& C) ^ (A && B) ^ D;
endmodule
module notgate(
input a,
output b
);
assign b=~a;
endmodule
module Peresgate(input A,B,C, output P1,P2,P3
);
assign P1 = A;
assign P2 = ((~A)&B) ^ (A&B);

assign P3 = (A && B) ^ ((~A)&C);
endmodule
module toffolig(
input a,b,c,
output p,q,r
);
assign p=a;
assign q=b;
assign r=(a&b)^c;
endmodule
module TR_gate(
input a,b,c,
output p,q,r
);
assign p=a;
assign q=a^b;
assign r=(a&(~b))^c;
endmodule


Chapter-1: Dept of ECE, PEC

Uploaded by

Copyright:

Available Formats

Chapter-1: Dept of ECE, PEC

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter-1: Dept of ECE, PEC

Uploaded by

Copyright:

Available Formats

Performance analysis of Wallace tree multiper with kogge stone adder by using 15-4 compressor

Dept of ECE, PEC Page 1

Most of the students of Electronics Engineering are exposed to Integrated Circuits

Dept of ECE, PEC Page 2

1.2 DEALING WITH VLSI CIRCUITS

1.3 INTRODUCTION TO VLSI

Dept of ECE, PEC Page 3

Dept of ECE, PEC Page 4

ADVANTAGES OF ICS ABOVE DISCRETE COMPONENTS

1.4 VLSI AND SYSTEMS

Dept of ECE, PEC Page 5

An Application-Exact Integrated Circuit (ASIC) is an integrated circuit (IC)

Dept of ECE, PEC Page 6

Dept of ECE, PEC Page 7

Dept of ECE, PEC Page 8

A growing number of applications are designed to tolerate "noisy" real-world inputs

Purdue researchers have developed a range of hardware techniques to demonstrate

The research paper presented during the IEEE/ACM International Symposium on

Dept of ECE, PEC Page 9

Recently, the researchers have shown how to apply approximate computing to

Dept of ECE, PEC Page 10

Dept of ECE, PEC Page 11

2.1 LITERATURE SERVEY

Sammy, P. et al. “A Programmable FIR filter for TV ghost Cancellation” IEEE

Dept of ECE, PEC Page 12

2.2 EXISTING SYSTEM

figure 2.1 initially output o2of 5-3 compressor is approximated

Dept of ECE, PEC Page 14

figure 2.2: logic diagram of design 2 approximate 5-3 compressor.

2.3 PROPOSED SYSTEM

Dept of ECE, PEC Page 15

Dept of ECE, PEC Page 16

figure 2.3. structure of 16×16 multiplier using 15-4 compressor

The three stages of Wallace tree multiplier are mentioned below:

Dept of ECE, PEC Page 17

figure 2.4. schematic view of 16×16-bit Wallace tree multiplier

2.8 15-4 COMPRESSOR

DSP, image processing needs several compressors to perform arithmetic operation. A

Figure 2.5. Logic Diagram of Approximate 15-4 Compressor

Figure 2.6. Logic diagram of 5-3 compressor

B. KOGGE STONE ADDER

Dept of ECE, PEC Page 20

Dept of ECE, PEC Page 21

Figure 2.7. Block diagram of kogge stone adder

Figure 2.8. RTL view of kogge-stone adder

Dept of ECE, PEC Page 22

Figure 3.1. Creation of a new project

Dept of ECE, PEC Page 23

Figure 3.2. Project settings

Figure 3.3. Creation of a new source

Dept of ECE, PEC Page 24

• Select source type as Verilog module

Figure 3.4 Selection of type of source

Figure 3.5. Summary of new source

Dept of ECE, PEC Page 25

Figure 3.6: Checking for the syntax

Dept of ECE, PEC Page 26

Dept of ECE, PEC Page 27