Chapter-1: Dept of ECE, PEC
Chapter-1: Dept of ECE, PEC
Chapter-1: Dept of ECE, PEC
CHAPTER-1
INTRODUCTION
1.1 INTRODUCTION
multipers are the essential part of the digital system like Arithmetic and Logic Units,
Digital Signal Processors, etc. Usually, they prompt the performance like power, delay and
area utilization of the system. Hence there is an increasing demand for the improvement of
performance of the multiplier. The multiplier consists of 3 stages - partial products
generation, partial products reduction and addition at the last stage. The second stage
(partial product) in multiplier utilize more time and power. Various techniques were
suggested to diminish multipliers critical stages. The most popular technique is using the
compressor in the reduction stage of partial product. The compressor is simply an adder
circuit. It takes a number of equally-weighted bits, adds them, and produces some sum
signals. Compressors are commonly used with the aim of reducing and accumulating a
large number of inputs to a smaller number in a parallel manner. Their main application is
within a multiplier, where partial products have to be summed up in a large amount
concurrently. The inner structure of compressors avoids carry propagation. Either there are
not any carry signals or they do arrive at the same time of the internal values.For the
purpose of reducing the delay in the second stage, several compressors are needed. Small
sizes of compressors are useful for designing the small size multiplier.
In multiplier design, the different sizes of compressors are required depending upon
the bit size. In this paper, a scheme for delay reduction in 16bit Wallace tree multiplier
with 15:4 compressor is considered. To build 15:4 compressor, a 5:3 compressor is
considered as a basic module. AND gate is used for the generation of partial products. For
‘N’ bit multiplier ‘N2’ AND gates are needed. In the partial product reduction phase,
there are three major components namely half adder, full adder and 5-3 compressor. The
final stage of addition is done by using Kogge-Stone adder. Fig. 1 shows the structure of
16X16 multiplier. Simulation results show that the approximate multiplier with
compressor using Kogge stone adder achieves high performance while comparing to
the multipliers with compressor using
another adder like parallel adder. This paper is elaborated in following sections. Designs
of approximate 16×16 Wallace tree multiplier are detailed in section II. Brief notes and
design of 15-4 compressor, 5-3 compressor and Kogge stone adder is described
have been integrated above time. The primary integrated circuits held just a few devices,
possibly as many as ten diodes, transistors, resistors and capacitors, making it possible to
manufacture a number of logic gates on a single gadget. Now known retrospectively as
"small-scale integration" (SSI), development in system ended in instruments with hundreds
of common sense gates, recognized as large-scale integration (LSI), i.e. Present
technological know-how has inspired far previous this mark and modern day
microprocessors have a few hundreds of thousands of gates and hundreds and hundreds of
hundreds of thousands of special transistors.
As of early 2008, billion-transistor processors are commercially to be had, an
instance of which is Intel's Montecito Itanium chip. That is anticipated to grow to be more
common as semiconductor fabrication strikes from the present new release of 65 nm
strategies to the subsequent forty-five nm generations (whilst experiencing new challenges
akin to increased version throughout process corners). A different terrific example is
NVIDIA’s 280 series GPU. This microprocessor is exact in the truth that its 1.4 Billion
transistors rely, capable of a teraflop of executeance, is roughly utterly committed to logic
(Itanium's transistor rely is basically as a result of the 24MB L3 cache). Current designs,
versus the earliest gadgets, use large design automation and automatic good judgment
synthesis to put out the transistors, enabling better levels of obstacle in the resulting logic
functionality. Specific excessive-executeance common sense blocks like the SRAM phone,
nevertheless, are nonetheless designed by means of hand to make sure the absolute best
effectivity (normally by way of bending or breaking established design ideas to receive the
last little bit of executeance through buying and selling balance).
WHAT IS VLSI?
VLSI stands for "Very Large-Scale Integration". This is the field which occupy
packing additional logic devices into slighter areas.
VLSI
Simply we say Integrated circuit is many transistors on one chip.
Design/manufacturing of extremely small, complex circuitry using modified
semiconductor material
Integrated circuit (IC) may contain millions of transistors, both a few mm in size
Applications wide ranging: most electronic logic devices
These advantages of integrated circuits translate into advantages at the system level:
Smaller physical size. Smallness is frequently a benefit in itself-consider portable
televisions or handheld cellular telephones. Lower power consumption. Substitute a
handful of normal parts with a single chip decreases total power utilization. Reducing
power consumption has a ripple effect on the rest of the system: a smaller, cheaper power
supply can be used; since less power consumption means less heat, a fan may no longer be
essential; a simpler cabinet with less shielding for electromagnetic shielding may be
feasible, too.
Reduced cost. Dropping the number of components, the power supply requirements,
cabinet costs, and so on, will inevitably reduce system cost. The ripple effect of integration
is such that the cost of a system built from custom ICs can be less, even though the
individual ICs cost more than the standard parts they replace.
Understanding why integrated circuit technology has such profound influence on the
design of digital systems requires understanding both the technology of IC Manufacturing
and the economics of ICs and digital programs. The growing sophistication of applications
consistently pushes the design and manufacturing of integrated circuits and electronic
systems to new levels of problem. And maybe probably the most potent characteristic of
this assortment of programs is its sort-as techniques turn out to be more difficult, we
construct no longer a number of normal-intent computer systems however an ever-wider
variety of designated-purpose techniques. Our potential to take action is a testimony to our
developing mastery of each integrated circuit manufacturing and design, however the
growing demands of customers continue to test the limits of design and manufacturing
1.4.1 ASIC
Field-programmable gate arrays (FPGA) are the present day-day technological know-
how for building a breadboard or prototype from common materials; programmable good
judgment blocks and programmable interconnects permit the identical FPGA to be used in
many one-of-a-kind applications. For smaller designs and/or lower production volumes,
FPGAs may be more cost effective than an ASIC design even in production.
An application-exact integrated circuit (ASIC) is an integrated circuit (IC) customized for
a particular use, rather than intended for general-purpose use.A Structured ASIC falls
among an FPGA and a Standard Cell-based ASIC Structured ASIC’s are used mainly for
mid-volume level designs The design task for structured ASIC’s is to map the circuit into
a fixed arrangement of known cells.
Among different arithmetic blocks, the multiplier is one of the main blocks, which is
widely used in different applications especially signal processing applications. There are
two general architectures for the multipliers, which are sequential and parallel. While
sequential architectures are low power, their latency is very large. On the other hand,
parallel architectures (such as Wallace tree and Dadda) are fast while having high-power
consumptions. The parallel multipliers are used in high-performance applications where
their large power consumptions may create hot-spot locations on the die. Since the power
consumption and speed are critical parameters in the design of digital circuits, the
optimizations of these parameters for multipliers become critically important. Very often,
the optimization of one parameter is performed considering a constraint for the other
parameter. Specifically, achieving the desired performance (speed) considering the limited
power budget of portable systems is challenging task. In addition, having a given level of
reliability may be another obstacle in reaching the system target performance.
To meet the power and speed specifications, a variety of methods at different design
abstraction levels have been suggested. Approximate computing approaches are based on
achieving the target specifications at the cost of reducing the computation accuracy. The
approach may be used for applications where there is not a unique answer and/or a set of
answers near the accurate result can be considered acceptable. These applications include
multimedia processing, machine learning, signal processing, and other error resilient
computations. Approximate arithmetic units are mainly based on the simplification of the
arithmetic units’ circuits. There are many prior works focusing on approximate multipliers
which provide higher speeds and lower power consumptions at the cost of lower
accuracies. Almost, all of the proposed approximate multipliers are based on having a fixed
level of accuracy during the runtime. The runtime accuracy re configurability, however, is
considered as a useful feature for providing different levels of quality of service during the
system operation. Here, by reducing the quality (accuracy), the delay and/or power
consumption of the unit may be reduced. In addition, some digital systems, such as general-
purpose processors, may be utilized for both approximate and exact computation modes.
An approach for achieving this feature is to use an approximate unit along with a
corresponding correction unit. The correction unit, however, increases the delay, power,
and area overhead of the circuit. Also, the error correction procedure may require more
than one clock cycle, which could, in turn, slowdown the processing further.
In this paper, we present four dual-quality reconfigurable approximate 4:2
compressors, which provide the ability of switching between the exact and approximate
operating modes during the runtime. The compressors may be utilized in the architectures
of dynamic quality configurable parallel multipliers. The basic structures of the proposed
compressors consist of two parts of approximate and supplementary. Inthe approximate
mode, only the approximate part is active whereas in the exact operating mode, the
supplementary part along with some components of the approximate part is invoked.
APPROXIMATE COMPUTING
"The need for approximate computing is driven by two factors: a fundamental shift in
the nature of computing workloads, and the need for new sources of efficiency," said, a
Purdue Professor of Electrical and Computer Engineering, who has been working in the
field for about five years. "Computers were first designed to be precise calculators that
solved problems where they were expected to produce an exact numerical value.
However, the demand for computing today is driven by very different applications.
Mobile and embedded devices need to process richer media, and are getting smarter –
understanding us, being more context-aware and having more natural user interfaces. On
the other hand, there is an explosion in digital data searched, interpreted, and mined by
data centers."
"The nature of these computations is different from the traditional computations where
you need a precise answer," said SrimatChakradhar, department head for Computing
Systems Architecture at NEC Laboratories America, who collaborated with the Purdue
team. "Here, you are looking for the best match since there is no golden answer, or you
are trying to provide results that are of acceptable quality, but you are not trying to be
perfect."
However, today's computers are designed to compute precise results even when it is
not necessary. Approximate computing could endow computers with a capability similar
to the human brain's ability to scale the degree of accuracy needed for a given task. New
findings were detailed in research presented during the IEEE/ACM International
Symposium on Microarchitecture, Dec. 7-11 at the University of California, Davis.
The inability to perform to the required level of accuracy is inherently inefficient and
saps energy.
"If I asked you to divide 500 by 21 and I asked you whether the answer is greater than
one, you would say yes right away," Raghunathan said. "You are doing division but not
to the full accuracy. If I asked you whether it is greater than 30, you would probably take
a little longer, but if I ask you if it's greater than 23, you might have to think even harder.
The application context dictates different levels of effort, and humans are capable of this
scalable approach, but computer software and hardware are not like that. They often
compute to the same level of accuracy all the time."
"In order to have a broad impact we need to be able to apply this technology to
programmable processors," Roy said. "And now we have shown how to design a
programmable processor to perform approximate computing."
The researchers achieved this milestone by altering the "instruction set," which is the
interface between software and hardware. "Quality fields" added to the instruction set allow
the software to tell the hardware the level of accuracy needed for a given task. They have
created a prototype programmable processor called Quora based on this approach.
"You are able to program for quality, and that's the real hallmark of this work," lead
author Venkataramani said. "The hardware can use the quality fields and perform energy
efficient computing, and what we have seen is that we can easily double energy
efficiency."In other recent work, led by Chippa, the Purdue team fabricated an
approximate "accelerator" for recognition and data mining."We have an actual hardware
platform, a silicon chip that we've had fabricated, which is an approximate processor for
recognition and data mining," Raghunathan said. "Approximate computing is far closer to
reality than we thought even a few years ago."
Approximate computing leverages the intrinsic resilience of applications to
inexactness in their computations, to achieve a desirable trade-off between efficiency
(performance or energy) and acceptable quality of results. To broaden the applicability of
approximate computing, we propose quality programmable processors, in which the
notion of quality is explicitly codified in the HW/SW interface, i.e., the instruction set.
The ISA of a quality programmable processor contains instructions associated with
quality fields to specify the accuracy level that must be met during their execution. We
show that this ability to control the accuracy of instruction execution greatly enhances the
scope of approximate computing, allowing it to be applied to larger parts of programs.
The micro-architecture of a quality programmable processor contains hardware
mechanisms that translate the instruction-level quality specifications into energy savings.
Additionally, it may expose the
actual error incurred during the execution of each instruction (which may be less than the
specified limit) back to software. As a first embodiment of quality programmable
processors, we present the design of Quora, an energy efficient, quality programmable
vector processor. Quora utilizes a 3-tiered hierarchy of processing elements that provide
distinctly different energy vs. quality trade-off, and uses hardware mechanisms based on
precision scaling with error monitoring and compensation to facilitate quality
programmable execution. We evaluate an implementation of Quora with 289 processing
elements in 45nm technology. The results demonstrate that leveraging quality-
programmability leads to 1.05X-1.7X savings in energy for virtually no loss (< 0.5%) in
application output quality, and 1.18X-2.1X energy savings for modest impact (<2.5%) on
output quality. Our work suggests that quality programmable processors are a significant
step towards bringing approximate computing to the mainstream.
CHAPTER-2
LITERATURE SERVEY
filter design. Efficient circuit level techniques namely a new carry select adder and
Conditional Capture Flip-Flop (CCFF), were also used to further improve power and
performance. The suggested FIR filter architecture was implemented in 0.25 pm
technology. Experimental results on a 10-tap low pass CSHM FIR filter showed speed and
power improvement of 19% and 17%, respectively
Bruce, H., et al. “Power Optimization of Reconfigurable FIR filter” IEEE Transaction
on 2004. This paper describes power optimization techniques applied to a reconfigurable
digital finite impulse response (FIR) filter used in a Universal Mobile Telephone Service
(UMTS) mobile terminal. Various methods of optimization for implementation were
combined to achieve low cost in terms of power consumption. Each optimization method
is described in detail and is applied to the reconfigurable filter. The optimization methods
have achieved a 78.8 % reduction in complexity for the multipliers in the FIR structure.
An automated method for transformation of coefficient multipliers into bit-shifts is also
presented.
Süleyman, S.D. and Andrew G.D. “Efficient Implementation of Digital filters using
novel Reconfigurable Multiplier Blocks” IEEE Transaction on 2004. Generally
Reconfigurable Multiplier Block (ReMB) offer significant complexity reductions in
multiple constant multiplications in time multiplexed digital filters. The ReMB technique
was employed in the implementation of a half-band 32-tap FIR filter on both Xilinx
Virtex FPGA and UMC 0.18μm CMOS technologies. Reference designs had also been
built by deploying standard time-multiplexed architectures and off-the-shelf Xilinx Core
Generator system for the FPGA design. All designs were then compared for their area
and delay figures. It was shown that, the ReMB technique can significantly reduce the
area for the multiplier circuitry and the coefficient store, as well as reducing the delay.
three bits. We have chosen 5-3 compressor because it is a basic module for 15-4
compressor. Error rate and error distance of each design are considered. Design 1 In this
design, initially output O2 of 5-3 compressor is approximated. Logical AND between
inputs X3 and X2 matches with accurate output O2 of the conventional 5-3 compressor
with an error rate of 18.75%. The following expressions show design 1 of 5-3 approximate
compressor.
DESIGN 2
In this design, O2, O1 are approximated and O0 is kept as the same as original expression.
Error distance of all the error cases is either -2 or +2. From the truth table, it can be noted
that pass rate of O2 is 87.5% when O2 alone is replaced with O2 in a 5-3 compressor.
Similarly, pass rate of O1 is 75% when compared with the O1 output of the 5-3 compressor.
Expression for O2 and O1 are modified to get the minimum error distance. The overall
pass rate of this design is 75%. The output of the compressor differs only in eight input
cases. In this design, the critical path is between input X0 and output O0. Four XOR gates
are involved in the critical path. This design has least critical path than other proposed
designs.
using Xilinx ISE 14.5 software. The multiplier occupies about 15% of total coverage area.
The dissipated power and delay of the multiplier are 0.042μw, 3.125ns respectively.
2.4 OBJECTIVE
Most of the students of Electronics Engineering are exposed to Integrated Circuits
(IC's) at a very basic level, involving SSI (small scale integration) circuits like logic gates
or MSI (medium scale integration) circuits like multiplexers, parity encoders etc. But there
is a lot bigger world out there involving miniaturization at levels so great, that a micrometer
and a microsecond are literally considered huge! This is the world of VLSI - Very Large-
Scale Integration. The article aims at trying to introduce Electronics Engineering students
to the possibilities and the work involved in this field.
2.5 THEORY OF PROJECT
To implement the project our goal is to increase the computing efficiency and to
decrease the power consumption by introducing the kogge stone adder and Wallance tree
multiper
2.6 REQUIRED COMPONENTS
• Xilinx ISE 14.5 software
• Wallace tree multiper
• Kogge stone adder
• 15-4 compressor
• Power supply
• display
2.7 WALLACE TREE MULTIPLIER
Multiplier is the substantive part of the electronic device and decides the overall
performance of the system. When designing a multiplier, huge amount of power and delay
are generated. To minimize these disadvantages, adders and compressor are used. Hence
reducing delay in multiplier has been a main aim to enhance the performance of the digital
systems like DSP processors. Hence many attempts are done on multipliers to make it
faster. It is an effective hardware realization of digital system that is nothing but a Wallace
tree which multiplies two numbers and minimizes the number of partial products. In vector
processors, several multiplications are performed to obtain data orloop level parallelism.
High processing speed and low power consumption are the major advantages of this
multiplier.
Fig. 3 and Fig. 4 describes the structure and schematic view of 16bit multiplier using with
the help of 15-4 compressor. Here in this design each dot denotes partial product. From
13th column onwards, 15-4 compressors are used in this multiplier architecture. Column
number 13 consist of 13 partial products, in order to get 15 partial products 2 zeros are
added. Similarly, in 14th column, one zero is added. Approximate compressors are used in
13th, 14th and15th column of multipliers. The partial product reduction phase consists of
half adder, full adder and 5:3 compressors. When the numbers of bits in the column are 2
and 3 half adders and full adders are used in each column. In case of a single bit, it is moved
further to the subsequent level of that particular column without any need for further
processing. Until only two rows will remain, this reduction process is repeated. Finally,
summation of the last two rows is achieved using 4-bit Kogge-Stone adder.
A. 5-3 COMPRESSOR
The 15-4 compressor consists of 5-3 compressor as a basic design. The 5-3
compressor utilizes five primary inputs namely A0, A1, A2, A3, A4 and produces three
outputs namely
Dept of ECE, PEC Page 19
Performance analysis of Wallace tree multiper with kogge stone adder by using 15-4 compressor
B0, B1, B2. In this compressor, the presence of number of 1’s at the input decides the
output of compressor and also uses counter property.
The design of compression of given 5 inputs into 3 output is called the design of 5-3
compressor. Error rate of 5-3 compressor is considered. The design equations of 5-3
approximate compressor are shown in following equations respectively. The logic diagram
of approximate 5-3 compressor is as shown in Fig. 6.
In 1973, Peter M. Kogge and Harold S. Stone introduced the concept of efficient
and high- performance adder called kogge-stone adder. It is basically a parallel prefix
adder. This type of adder has the specialty of fastest addition based on design time. It is
known for its special and fastest addition based on design time [9], [10].
functional block diagram and RTL view of a 4-bit KoggeStone Adder is shown.
By using the ith bit of given input, the propagate signals ‘Pi’ and generate signals ‘Gi’ are
calculated. Similarly, these generated signals produce output carry signals. Therefore, by
minimizing the computation delay, Prefix Adders are mainly classified into 3 categories.
A. Pre- processing
B. Generation of Carry.
C. Final processing.
A. Pre-Processing
In this stage, the generate and propagate signals are given by the equations 5&6.
B. Generation of carry
In this stage, carries are calculated with their corresponding bits and this operation is
executed in parallel manner. Carry propagation and generation are used as intermediate
signals. The logic equations for carry propagate and generate are shown below.
C. Final Processing
In final processing, the sum and carry outputs bits are computed for the given input bits
and the logic equation for the final processing stage is given by
CHAPTER-3
DESIGN AND IMPLEMENTATIONS
3.1 XILINX ISE
XILINX 14.5
Xilinx software is required for both VHDL and VERILOG designers to perform
synthesis operation. Any simulated code is synthesized and configured on FPGA. The
process of changing VHDL code into gate level net list is called synthesis. It is the main
part of current design flows.
xc3s100e and the language preferred is Verilog. After the properties are selected, then it
clicks next.
• When the Verilog code is completed the check for syntax errors.
• Click on RTL schematic and click on technology and after that go for the synthesis
report.
• Then one needs to correct any errors in the HDL source file. The comments which
are placed above error help to fix the errors.
• After correcting the errors go to File menu and then press Save to save the file.
• The parsing message should then indicate that file is error free and should display
that file is checked successfully.
• Verilog synthesis tools could create logical circuit structures directly from Verilog
behavioral description and target them to a selected technology for realization (i.e.,
convert Verilog to actual hardware).
• By using Verilog, design, simulation and synthesis are performed by a simple
combinational circuit to complete microprocessor on chip.
• Verilog HDL is a standard hardware description language. Verilog HDL is having
many useful features for hardware design.
• Verilog HDL is a general-purpose hardware description language, which is to learn
and use easily. The syntax for Verilog is same as C programming language.
Designer says that who has experience with C programming they can easily learn
Verilog HDL.
• It allows different levels of modeling mixed in the same model. Hence, switches,
gates, RTL, or behavioral code of modeling hardware are defined by designer. Also,
designer learns easily one language for incentive and hierarchical design.
• Verilog HDL supports the popular logic synthesis. This makes the designers can
choose any language.
• The Programming Language Interface (PLI) is feature in which C code is written
to interact with Verilog data structures.
3.1.2 VERILOG HDL
Verilog HDL is a hardware description language that can model digital system at
many abstract levels ranging from the algorithmic level to the gate level and to the switch
level. The modeling of the digital system difficulties changes from simple gate to complete
digital electronic system. The digital system is described hierarchically and timing can be
highly modeled within the same description.
The Verilog language includes the behavioral, the dataflow and structural model,
delays and mechanism of generating waveform including monitoring response and
verification is modeled by one single language. In addition to this, the internal design is
accessed and simulation run is controlled by simulation because language has
programming language interface.
This language defines not only the syntax but also gives
simulation semantics construct for each language. Therefore, models written in this
language can be verified by Verilog simulator. The language inherits many of its operator
symbols and obtained from C programming language. This provides large capabilities of
modeling, some of them are difficult to understand initially. Hence, a core language is easy
to learn and use. This is sufficient to model many applications.
3.1.3 VERILOG CAPABILITIES
The following are the major capabilities of the Verilog:
• Primitive logic gates like AND, OR and NAND, are built in the language.
• High-level language constructs such as condition- else, case statements, and loops
are available in the language.
• The idea of timing and concurrency are unambiguously modeled.
• Enhanced code reporting data management with well grain manages of information in
the source window.
• Some IEEE VHDL 2009 features were support and including source code encryption.
Additional support of novel VPI types, together with packed arrays of struck variables
andnets.
MODELSIM SE FEATURES
• Multi language, soaring presentation simulation engine
• Code Coverage
• incorporated debug
• SystemC Option
• Windows 32-bit
MODELSIM SE BENEFITS
• Soaring performance HDL simulation solution forASIC& FPGA design teams
• Intuitive GUI for efficient interactive or post-simulation debug of RTL and gate-level
designs
• Amalgamation reporting and, ranking of code exposure for tracking verification
evolution
• Sign-off sustain for trendy ASIC libraries
CHAPTER-4
SIMULATION RESULTS
4.1 SIMULATION RESULTS
The design of approximate16 bit Wallace multiplier using 15-4 compressor has been
done in HDL, using Xilinx ISE 14.5. Simulation results show the design of overall
architecture of Wallace tree multiplier as shown in Fig. 9. The parameters of area are
utilized by the multiplier design and power consumption are obtained through simulation
and tabulated in Table I and Table II. The snapshot of delay obtained through simulation
is shown in Fig.
10. The processing delay at the end of addition level can be reduced by using kogge stone
adder
Table I and II describes the area utilization and power parameters of a 16-bit Wallace
multiplier. It shows better result than other adder apart from that it gives less area and low
propagation delay.
FUTURE SCOPE
In future the performance of the proposed multiplier can be improved and applied in
applications like video and image processing.
CONCLUSION
The approximate 16×16bit Wallace tree multiplier using 15- 4 compressor architecture
has been designed and synthesized using on Spartan 3 XC3S100E board and simulated in
Xilinx ISE 14.5. The performance of proposed Multiplier with kogge stone adder is
compared with the same architecture of multiplier using parallel adder. It can be inferred
that 16×16 multiplier architecture using 15-4 compressor with kogge stone adder is faster
compared to multiplier with parallel adder. In future the performance of the proposed
multiplier can be improved and applied in applications like video and image processing.
REFERENCES
[1] C. S. Wallace, A Suggestion for a Fast Multiplier, IEEE Transactions on Computers,
13, 1964,14-17.
[3] C.-H. Lin and I.-C. Lin, ‘‘High accuracy approximate multiplier with error correction,’’
in Proc. IEEE 31st Int. Conf. Comput. Design (ICCD), Oct. 2013, pp. 33–38
[7] Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and analysis of approximate
compressors for multiplication,” IEEE Trans. Comput., vol. 64, no. 4, pp. 984–994, Apr.
2015. [8] Teffi Francis, Tera Joseph and Jobin K Antony., “Modified MAC Unit for low
power high speed DSP application using multiplier with bypassing technique and
optimized adders”, IEEE-31661, 4th ICCCNT, 2013.
[9] Yezerla, Sudheer Kumar, and B. Rajendra Naik. "Design and Estimation of delay,
power and area for Parallel prefix adders." In Engineering and Computational Sciences
(RAECS), 2014 Recent Advances in, pp. 1-6. IEEE, 2014.
[10] Y. Choi, “Parallel Prefix Adder Design” Proc. 17th IEEE Symposium on Computer
Arithmetic, pp 90-98, 27th June 2005. [11] Belle W. Y. Wei and Clark D. Thompson,
“Area-Time Optimal Adder Design”, IEEE transactions on Computers, vol.39, pp. 666-
675, May1990.
APPINDEX-A
module test_TRgate;
// Inputs
reg a;
reg b;
reg c;
// Outputs
wire p;
wire q;
wire r;
TR_gate uut (
.a(a),
.b(b),
.c(c),
.p(p),
.q(q),
.r(r)
);
initial begin
// Initialize Inputs
a = 0;
b = 0;
c = 0;
#100
a = 0;
b = 1;
c = 0;
#100
a = 0;
b = 1;
c = 1;
#100
a = 1;
b = 0;
c = 1;
#100;
module test_tiffoli;
// Inputs
reg a;
reg b;
reg c;
// Outputs
wire p;
wire q;
wire r;
toffolig uut (
.a(a),
.b(b),
.c(c),
.p(p),
.q(q),
.r(r)
);
initial begin
// Initialize Inputs
a = 0;
b = 0;
c = 0;
#100
a = 0;
b = 1;
c = 0;
#100
a = 1;
b = 1;
c = 0;
#100
a = 1;
b = 1;
c = 1;
#100;
End
end module
module test_peres;
// Inputs
reg A;
reg B;
reg C;
// Outputs
wire P1;
wire P2;
wire P3;
Peresgate uut (
.A(A),
.B(B),
.C(C),
.P1(P1),
.P2(P2),
.P3(P3)
);
initial begin
// Initialize Inputs
A = 0;
B = 0;
C = 0;
#100
A = 0;
B = 0;
C = 1;
#100
A = 0;
B = 1;
C = 0;
#100;
end
end module
module test_HNGate;
// Inputs
reg A;
reg B;
reg C;
reg D;
// Outputs
wire H1;
wire H2;
wire H3;
wire H4;
HNgate uut (
.A(A),
.B(B),
.C(C),
.D(D),
.H1(H1),
.H2(H2),
.H3(H3),
.H4(H4)
);
initial begin
// Initialize Inputs
A = 0;
B = 0;
C = 0;
D = 0;
#100
A = 1;
B = 0;
C = 1;
D = 0;
#100
A = 0;
B = 1;
C = 1;
D = 0;
#100
A = 1;
B = 0;
C = 1;
D = 1;
#100
A = 0;
B = 1;
C = 0;
D = 1;
#100
A = 0;
B = 1;
C = 1;
D = 1;
#100
#100;
end
end module
module test_Fredkingate;
// Inputs
reg A;
reg B;
reg C;
// Outputs
wire Fr1;
wire Fr2;
wire Fr3;
. A(A),
. B(B),
.C(C),
. Fr1(Fr1),
. Fr2(Fr2),
. Fr3(Fr3)
);
initial begin
// Initialize Inputs
A = 0;
B = 0;
C = 0;
#100
A = 0;
B = 0;
C = 0;
#100
A = 0;
B = 0;
C = 0;
#100
A = 0;
B = 0;
C = 0;
#100
A = 0;
B = 0;
C = 0;
#100;
end
module test_feynman_gate;
// Inputs
reg A;
reg B;
// Outputs
wire F1;
wire F2;
.A(A),
.B(B),
.F1(F1),
.F2(F2)
);
initial begin
// Initialize Inputs
A = 0;
B = 0;
#100
A = 0;
B = 1;
#100
A = 1;
B = 0;
#100
A = 1;
B = 1;
#100;
end
endmodule
module test_feynman_double;
// Inputs
reg A;
reg B;
reg C;
// Outputs
wire F1;
wire F2;
wire F3;
feynman_double uut (
.A(A),
.B(B),
.C(C),
.F1(F1),
.F2(F2),
.F3(F3)
);
initial begin
// Initialize Inputs
A = 0;
B = 0;
C = 0;
#100
A = 0;
B = 1;
C = 1;
#100
A = 1;
B = 0;
C = 1;
#100
A = 0;
B = 1;
C = 0;
#100;
end
endmodule
);
assign H1 = A;
assign H2 = B;
assign H3 = A ^ B ^ C;
endmodule
module notgate(
input a,
output b
);
assign b=~a;
endmodule
);
assign P1 = A;
endmodule
module toffolig(
input a,b,c,
output p,q,r
);
assign p=a;
assign q=b;
assign r=(a&b)^c;
endmodule
module TR_gate(
input a,b,c,
output p,q,r
);
assign p=a;
assign q=a^b;
assign r=(a&(~b))^c;
endmodule