A Low Power Selective Median Filter Design
A Low Power Selective Median Filter Design
A Low Power Selective Median Filter Design
Master of Technology
In
By
Radhamadhab Dalai
PIN-769008 (ORISSA)
JUNE-2008
A LOW POWER SELECTIVE MEDIAN FILTER DESIGN
Master of Technology
In
Computer Science and Engineering
By
Radhamadhab Dalai
Under The Guidance of
&
PIN-769008 (ORISSA)
JUNE-2008
National Institute of Technology Rourkela
CERTIFICATE
This is to certify that the thesis entitled, “A Low Power Selective Median Filter Design”,
submitted by Sri.Radhamadhab Dalai (Roll No:-20606002) in partial fulfillment of the
requirements for the award of Master of Technology Degree in Computer Science and
Engineering at National Institute of Technology, Rourkela (Deemed University) is an
authentic work carried out by him under our supervision and guidance.
To the best of our knowledge, the matter embodied in the thesis has not been submitted
to any other University / Institute for the award of any Degree or Diploma.
I
Acknowledgment
No thesis is created entirely by an individual, many people have helped to create this the-
sis and each of their contribution has been valuable. I express my sincere gratitude to my
thesis supervisors, Dr. Banshidhar Majhi, Professor and Head of the Department,
Computer Science and Engineering, and Dr. Kamala Kanta Mahapatra, Professor,
Electronics and Communication Engineering for their kind and valuable guidance for the
completion of the thesis work.
I would also take this opportunity to express my gratitude and sincere thanks to my
honorable teachers Dr. S. K. Rath, Dr. D P Mahapatra, SL. Bibhudatta Sahoo, Prof. R
Baliarsingh, Dr. A. K. Turuk, Dr. S .K. Jena for their invaluable advice, constant help,
encouragement, inspiration and blessings.
Submitting this thesis would have been a Herculean job, without the constant
help, encouragement, support and suggestions from my friends, especially Atanu S. Bal,
Tankadhar Mahanta, Saurav Ganguli, Jitendra K. Singh, Arun Kumar, Prem S. Tondon,E
Ashwin Kumar, K S Babu for their time to help. Although it will be difficult to record my
appreciation to each and every one of them in this small space; I will feel guilty if I miss
the opportunity to thank Tarun Sureja ,Arindam Makur, Yogesh Bani, Pushpendra
Chandra, G Raghav Rao, and others. I will relish your memories for years to come.
Radhamadhab Dalai
Roll No. 20606012
M. Tech, Department Of Computer Science and Engineering
II
Contents
1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.5 Motivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2 Literature Survey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Processor Design……………………………. . . . . . . . . . . . . . . . . . . . . . . . . 40
III
4.1 Median filter Implementation……………………………………………. 46
4.3 Simple memory mapped Control Unit design for Median filter . . . . . . . . 48
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Dissemination of Work……………………………………………………… 58
IV
LIST OF FIGURES
Figure No. Description Page No.
V
Figure 2.11 Architecture of Memory Based ROF Filter……………… 38
Figure 2.12 Noisy LENA Image…………………………………….. 39
Figure 2.13 Filtered Image by MVC [4]…………………………….. 39
Figure 3.1 Top level design of CPU……………………………….. 40
Figure 3.2 16-bit Instruction Set…………………………………….. 41
Figure 3.3 Block Diagram of CPU………………………………….. 42
Figure 3.4 Test bench waveform-1 for CPU simulation…………….. 44
Figure 3.5 Test bench waveform-2 for CPU simulation……………. 45
Figure 3.6 3 x 3 Window of a Test Pixel…………………………… 46
Figure 3.7 A Simple Image Processor……………………………… 48
Figure 3.8 User Defined function blocks for Median filter……….. 49
Figure 3.9 (a) Original Image (b) Image corrupted by 10% salt& 50
Pepper noise (c) Filtered Image………………………..
Figure 3.10 Image Corrupted by 50% Salt and Pepper Noise and 50
filtered image………………………………………….
Figure 3.11 Simulink Block Diagram Of median calculator of nine 51
pixel values………………………………………………
Figure 3.12 Subsystem using XILINX Block set…………………… 52
VI
LIST OF TABLES
VII
ABSTRACT
A selective median filter which consumes less power has been designed and different
logics for majority bit evaluation has been applied and simulated in VHDL .It is rightly
called as selective because an edge pixel detector [2] has been used to select those pixels
which are to be processed through median filter. As for median value calculation; sorting
of 3 x 3 window’s pixel values has been done using majority bit circuit [4]. Different
majority bit calculation method has been implemented and the result sorting circuit has
been analyzed for power analysis. In this work a general median filter which uses binary
sorting method known as Majority Voting Circuit (MVC) has been designed using VHDL
and optimized using SYNOPSIS which has used 0.13µm CMOS technology .The digital
design of sorting circuit saves approximately 60% of power comprising of cell leakage
and dynamic power comparing to a mixed signal design of Floating gate based Majority
bit median filter [4]. Before operating median filter on each pixel double derivative filter
[2] has been applied to check whether it is an edge pixel or not. Overall this is a digital
design of a mixed filter which preserves edges and removes noises as well. Low power
techniques at logic level and algorithmic level have been embedded into this work. In our
work we have also designed a small microprocessor using VHDL code. Later a memory
(for the purpose of image storing) based Control Unit for single median value evaluation
has been designed and simulated in XILINX. Here for sorting circuit a common logic
based circuit (component) has been put forward. The power, latency or delay, area of
whole design has been compared and tested with other designs.
VIII
Chapter 1
Introduction
A selective median filter is a mixed filter which removes spike noise (impulsive noise or
most commonly called as salt and pepper) from a noisy image while preserving sharp
edges. It is rightly called as ‘selective’ because this filter first checks each pixel whether
it is a noisy or non-noisy. For this purpose we have used a double derivative filter or
Laplace filter which acts as a noise detector. If it is found noisy then we apply a general
median filter as described in [2]. The main objective of both of these filters is to denoise
the noisy image and restore the original one while keeping the high intensity details
intact. The median filter is a major denoising technique used in image restoration
technique as described below. The following subsection also describes the degradation
process and some common noise models along with double derivative technique used for
sharpening the images.
The ultimate goal of restoration technique is to improve image in some predefine sense.
Although there are areas of overlap, image enhancement is largely a subjective process,
while image restoration is for the most part an objective process. In restoration technique
attempts to reconstruct or recover an image that has been degraded by using prior
knowledge of the degradation phenomenon. Thus restoration techniques are oriented
1
toward modeling the degradation and applying the inverse process to recover the original
image.
The general degradation process has been shown in figure:-1. In this chapter as a
degradation function that, together with an additive noise term, operates on an input
image f(x, y).Given g(x, y), some knowledge about the degradation function h, and some
knowledge about the additive noise term η(x, y), the objective of restoration is to obtain
an estimation of the original image.
Degradation Restoration
F(x, y) Function + Filters F'(x, y)
H
Degradation Restoration
Noise
η (x, y)
Sometimes image are degraded by external noises as noises are additive in nature. The
general degradation process of an image is given below
2
1.1.2. Some Noise Models
The main sources of noise in digital images arise during image acquisition (digitization)
and transmission. The performance of imaging sensors is affected by a variety of factors,
such as environmental conditions during image acquisition and by the quality of sensing
elements themselves. For example sensor temperature, light levels are major factors
affect the amount of noise in the resulting image. Images are corrupted during
transmission principally due to interference in the channel used for transmission. For
example; due to lightning effect or other atmospheric disturbance image transmitted by
wireless network is corrupted. Images acquired by optical, electro-optical or electronic
means are mostly degraded by the sensing environment. The degradations may be in the
form of sensor noise.
Some of basic noise models are defined over spatial domain and are characterized by
probability density function (PDF) as given below.
i. Impulsive Noise
Impulse noise shown in figure 1.2 is characterized by a noise spike replacing the actual
pixel value. Impulse noise is further divided into two classes. Random valued impulse
noise (RVIN) and salt and pepper noise (SPN). In RVIN, the impulse value at a particular
pixel may be a random value between particular intervals. But in SPN the impulses are
either the minimum value or the maximum value allowed in the intensity values. For
example 0 and 255 in the case of 8-bit image.
⎧p a ......when..z = a
⎪
p(z) = ⎨p b .....when..z = b ........................(1.4)
⎪0........otherwise
⎩
3
Pb
P (z)
Pa
a b z
P (z)
1/ (b-a)
a b z
4
iii. Gaussian noise
Due to mathematical tractability in both the spatial and frequency domains Gaussian
noise models are used in frequent in practice. The PDF of Gaussian random values z is
given by
1 2 2
P (z) = e − ( z − µ ) / 2 σ …………………………………… (1.6)
2 ∏σ
1 σ
2∏
0.067 σ
2∏
µ−σ µ+σ
Another type of image degradation is due to additive noise. Random values get added to
the intensity values. Denoising involves the application of some filtering technique to get
the true image back. The denoising algorithms [1] vary greatly depending on the type of
noise present in the image. Each type of image is characterized by a unique noise model.
Each noise model corresponds to a probability density function which describes the
distribution of noise within the image.
5
When the only degradation present in an image is noise, then equation (1.2) becomes
• Mean Filters
• Order-Statistics Filters
• Adaptive Filters
Mean filters
In this method the middle pixel value of the filter window is replaced with the arithmetic
mean or geometric mean or harmonic or contra-harmonic mean of all the pixel values
within the filter window. A mean filter simply smoothes local variations in an image.
Noise is reduced as a result of this smoothening, but edges within the image get blurred.
Examples: - Arithmetic Mean Filter, Geometric Mean Filter, Contra harmonic Mean
Filter
6
Adaptive Filters
Adaptive filters change its behavior based on the statistical characteristics of the image
inside the filter window. Adaptive filter performance is usually superior to non-adaptive
Counterparts. But the improved performance is at the cost of added filter complexity.
Mean and variance are two important statistical measures using which adaptive filters
can be designed. For example if the local variance is high compared to the overall image
variance, the filter should return a value close to the present value. Because high variance
is usually associated with edges and edges should be preserved.
7
enhancement results. Usually these filters along with other enhancement or restoring
filters yield impressive results.
The Laplacian for a function image f(x, y) of two variables x , y defined as
δ2f δ2f
∇ 2f = + ………………………………………………….(1.11)
δx 2 δy 2
In X-direction
δ 2f
=f(x+1, y) + f(x-1, y) -2f(x, y)……………….. ………………(1.12)
δx 2
In y-direction
δ2f
=f(x, y+1) + f(x, y-1) -2f(x, y)……………….. …………….. (1.13)
δy 2
Figure1.5 (a) The Original Image and (b) Filtered Image Applying Laplacian
As we can see from above figure Laplacian of the image produces a sharp edges; it is
useful when we need to preserve the sharp edges in a noisy image.
8
1.2. Low power VLSI Design
The VLSI low power design problems can be broadly classified into two: analysis and
optimization. Analysis problems are concerned about the accurate estimation of the
power or energy dissipation at different phases of the design process. Optimization is a
process of generating the best design given an optimization goal, without violating design
specification. Analyze techniques also serve for design optimization but major criteria to
be considered are the impact of circuit delay which affects throughput and performance
of circuit. Other factors are design cycle time, reliability, testability, quality, reusability.
Power efficiency cannot be achieved without yielding to one or more of these factors
because they are complementary to each other.
Another factor for low power chips is the increased market demand for portable
consumer electronics powered by batteries. The craving for smaller, lighter and more
durable electronic products indirectly translates to low power requirements. The power
dissipation of high performance computing system or microprocessors is now
approaching a considerable amount (in terms of watts).power dissipation has direct
impact on the packaging cost of the chip and the cooling cost of the system. Some of
9
personal computer’s CPUs require cooling fans directly mounted on the chip due to high
power dissipation.
Another major demand for low power chips and systems comes from
environmental concerns. A study by American Council for an energy efficient economy
estimated that office equipment doubles of in the period 1993 to 2000 and hence fastest–
growing electricity load in commercial sector.
The major power sources and the solution to reduce power are given below.
i. Short-Circuit Power
In a static CMOS circuit, there are two complementary networks: p-network (pull-up
network) and n-network (pull-down network) as shown in figure 1.6. The logic functions
for the two networks are complementary to each other. Normally when the input and
output state are stable, only one network is turned on and conducts the output either to
power supply node or to ground node and the other network is turned off and blocks the
current from flowing. Short-circuit current exists during the transitions as one network is
turned on and the other network is still active. For example, the input signal to an inverter
is switching from 0 to Vdd . During this transaction, there exists a short time interval
where the input voltage is larger than Vtn but less than Vdd - | Vtp |. During this time
10
circuit current consumes typically less than 10% of the total power in a "well-designed"
circuit [19].
V ss
P-Network
Vout
Vin
N-network
The leakage currents depends upon temperature and are proportional to the leakage area
and exponential of the threshold voltage.Sub threshold leakage and reverse-biased
junction leakage, both increases dramatically with temperature and are independent of the
operating voltage for a given fabrication process.
The leakage current is in the order of pico-Ampere, but it increases as the threshold
voltage is reduced. In some cases, like large RAMs, the leakage current is one of the
main concerns. The leakage current is currently not a severe problem in most digital
designs. However, the power consumed by leakage current can be as large as the power
consumed by the switching current for 06.0 mµ technology. The usage of multiple
11
threshold voltages can reduce the leakage current in deep-submicron technology.
Leakage current is difficult to predict, measure or optimized.
P+ N+ P+ P+ N+ P+
I Sub
I reverse P−
P−
Figure 1.7 Leakage Current types (a) Reverse Biased Current (b) Sub-threshold Current
Generally, leakage current serves no useful purposes, but some circuits do exploit it for i
ntended operations, such as power-on reset signal generation. The leakage power
problem mainly appears in very low frequency circuits or ones with “sleep modes” where
dynamic activities are suppressed.
Where α is the switching activity factor, C is the load capacitance, f is the clock
L
frequency, and Vdd is the supply voltage. The above equation (1.15) shows that the
switching power depends on a few quantities that are readily observable and measurable
in CMOS circuits. It is applicable to almost every digital circuit and hence provides
guidelines for the low power design. The power consumed by switching current is the
12
dominant part of the power consumption. Reducing the switching current is the focus of
most low power design techniques. For large capacitance circuits, reduction of the
frequency is the best way to reduce the switching power. The use of different coding
methods, number representation systems, continuing sequences and data representations
can directly alter the switching frequency of the design, which alters the switching power.
The best method of reducing switching frequency is to eliminate logic switching that is
not necessary for computation.
System Level
Portioning, Power Down
Algorithm
Level Algorithm optimizing
Architecture
Level Parallelism, Pipelining, Voltage
scaling
Logic Level
Data encoding, Efficient Logic styles
Circuit Level
Transistor sizing
13
i. System Level
A system typically consists of both hardware and software components, which affect the
power consumption. The system design includes the hardware/software partitioning,
hardware platform selection (application-specific or general-purpose processors),
resource sharing (scheduling) strategy, etc.
At the system level, the faster code and instruction level optimization using different
software tool power reduction can be possible. The power-down and clock gating are two
of the most used low power techniques at system level. The non-active hardware units are
shut down to save the power. The clock drivers, which often consume 30-40% of the total
power consumption, can be gated to reduce switching activities as illustrated in figure
1.9. In clock gating technique the unnecessary blocks are remained powered off when
they are not doing any operation.
The power-down can be extended to the whole system. This is called sleep mode and
widely used in low power processors. The system is designed for the peak performance.
However, the computation requirement is time varying. Adapting clocking frequency
and/or dynamic voltage scaling to match the performance constraints is another low
power technique. The lower requirement for performance at certain time interval can be
used to reduce the power supply voltage. This requires either feedback mechanism (load
monitoring and voltage control) or predetermined timing to activate the voltage down-
scaling.
Functional Unit
Clock
Enable
Asynchronous design of the circuit can also be used as another low power designing
technique. The asynchronous designs have many attractive features, like non-global
14
clocking, automatic power-down, no spurious transitions, and low peak current, etc. It is
easy to reduce the power consumption further by combining the asynchronous design
technique with other low power techniques, for instance, dynamic voltage scaling
technique, as shown in the following figure 1.10.
Input
Output
Buffer Processing
Unit
15
Another algorithmic level technique is algorithmic transformation. The loop unrolling
technique is a one of this transformation that aims to enhance the speed. This technique
can be used for reducing the power consumption. The With loop unrolling, the critical
path can be reduced and hence voltage scaling can be applied to reduce the power
consumption.
xn
yn
* +
D
b0
a1
*
xn * + +
2
2D
a1
b0 *
*
*
a 1b 0
b0
x n −1
y n +1
+
*
16
In figure 2.5, the unrolling reduces the critical path and gives a voltage reduction of 26%
[18 – 23]. This reduces the power consumption with 20% even the capacitance load is
increases with 50% [13]. Furthermore, this technique can be combined with other
techniques at architectural level, for instance, pipeline and interleaving, to save more
power. In some cases, like digital filters, the faster algorithms, combined with voltage-
scaling, can be used for energy-efficient applications [18].
The use of two parallel datapath is equivalent to interleaving of two computational tasks.
A datapath to determine the largest number of C and (A + B) is shown in figure 1.13. It
requires an adder and a comparator. The original clock frequency is 40 MHz.
In order to maintain the throughput while reducing the power supply voltage, we use a
parallel architecture. The parallel architecture with twice the amount of resources is
shown in figure 1.14. The clock frequency can be reduced to half, from 40 MHz to 20
MHz since two tasks are executed concurrently. This allows the supply voltage to be
scaled down from 5 V to 2.9 V Since the extra routing is required to distribute
computations to two parallel units, the capacitance load is increased by a factor of 2.15 .
17
Figure 1.13 Original Data path.
18
Figure1.15 Pipeline Implementation
Pipelining is another method for increasing the throughput. By adding a pipelining buffer
or register after the adder in figure 1.15, the throughput can be increased and also save
power. The power is calculated in three data paths using 1.15 equation and compared;
which shows in pipeline architecture it consumes lesser power.
Main advantage of pipelining is the low area overhead in comparison with using parallel
data paths. Another benefit is that the amount of glitches can be reduced. However, since
the delay increases significantly as the voltage approaches the threshold voltage and the
capacitance load for routing and/or pipeline registers increases, there exists an optimal
power supply voltage. Reduction of supply voltage lower than the optimal voltage
increases the power consumption.
19
iv. Logic Level
The low power techniques at the logic level, however, focus mainly on the reduction of
switching activity factor by using the signal correlation and the node capacitances.
R1
Subs
tractor R3
R2
Precomputation [18] uses the same concept to reduce the switching activity factor: a
selective precomputing of the output of a circuit is done before the outputs are required,
and this reduces the switching activity by gating those inputs to the circuit. As shown in
figure 2.9, the input data is partitioned into two parts, corresponding to registers R and
1
R . One part, R , is computed in precomputation block g, one clock cycle before the
2 1
computation of A. The result from g decides gating of R . The power can then be saved
2
comparison of MSB is performed in g. If two MSBs are not equal, the output from g
20
gated the remaining inputs. In this way, only a small portion of inputs to the comparator's
main block A (subtracter) is changed. Therefore the switching activity is reduced Gate
reorganization [14] is another technique used to restructure the circuit. This can be
decomposition a complex gate to simple gates, or combines simple gates to a complex
gate, duplication of a gate, deleting/addition of wires. The decomposition of a
complexgate and duplication of a gate help to separate the critical and non-critical path.
Which reduce the size of gates in the non-critical path, as a result reduces the power
consumption. In some cases, the decomposition of a complex gate increases the circuit
speed and gives more space for power supply voltage scaling. The composition of simple
gates can reduce the power consumption. The complex gate can reduce the
charge/discharge of high-frequently switching node. The deleting of wires reduces the
circuit size as a result, reduces the load capacitance.
Traditionally, the logic coding style is used for enhancement of speed performance.
Careful choice of coding style is important to meet the speed requirement and minimize
the power consumption. This can be applied to the finite state machine, where states can
be coded with different schemes.
A bus is the main on-chip communication channel that has large capacitance. As the on-
chip transfer rate, increases, the use of buses contributes with a significant portion of the
total power. Bus encoding is a technique to exploit the property of transmitted signal to
reduce the power consumption.
In CMOS circuits, the dynamic power consumption is caused by the transitions. Spurious
transitions typically consume between 10% and 40% of the switching activity power in
21
the typical combinational logic. In some cases, like array multipliers, the amount of
spurious transitions is large. To reduce the spurious transitions, the delays of signals from
registers that converge at a gate should be roughly equal. This can be done by insertions
of buffers and device sizing [14]. The insertions of buffer increase the total load
capacitance but can still reduce the spurious transitions. This technique is called path
balancing.
Many logic gates have inputs that are logically equivalent, i.e., the swapping of inputs
does not modify the logic function of the gate. Some examples of gates are NAND, NOR,
XOR, etc. However, from the power consumption point of view, the order of inputs does
effect the power consumption. Let’s consider the figure 1.10, the A-input, which is near
the output in a two-input NAND gate, consumes less power than the B-input closed to the
ground with the same switching activity factor.
22
Pin ordering is to assign more frequently switching input pins near to the output node,
which will consume less power. In this way, the power consumption will be reduced
without cost. However, the statistics of switching activity factors for different pins must
be known in advanced and this limits the use of pin ordering [13].
Different logic styles have different electrical characteristics. The selection of logic style
affects the speed and power consumption. In most cases, the standard CMOS logic is
used for speed and power trade-off. In some cases other logic styles, like complementary
pass-transistor logic (CPL) [15] is efficient.
Transistor sizing affects both delay and power consumption. Generally, a gate with
smaller size has smaller capacitance and consumes less power. To minimize the transistor
sizes and meet the speed requirement is a trade-off. Typically, the transistor sizing uses
static timing analysis to find out those gates (whose slack time is larger than 0) to be
reduced. The transistor sizing is generally applicable for different technologies.
Area: - The total area of whole design; It mainly comprises of total number of cells, gates
and flip flops used. For FPGA implementation the total chip area comprises of logical
blocks and Look Up Tables (LUTs).
Speed: - It is the measure of rate at which the designed IC produces an output .It depends
upon various factors such as gate delays, clock delays, and wire delays and also critical
path of design.
23
1.4. Problem definition:
A low power VLSI design of selective median filter circuit has been our primary goal.
This problem has been divided into two phases namely
i. Design of an algorithm for median filter with double derivative filter, (ii) the low power
VLSI design of this filter
The main objective is to remove spike or salt and pepper noise from the noisy image
while preserving the sharp edges. This is the main reason for why the double derivative
filter has been used. Designing complete hardware for Median filter with sorting
networks is computationally expensive. The main power consuming component here is
sorting circuit and it has been repeatedly used in simulation. Here we have used majority
voting logic which uses bit level computation. Taking 3 x 3 windows the median value is
calculated using MVL logic. Again preserving edges is also another goal because noise
suppressing and edge preserving are side by side affairs. The whole design which
consumes lesser and lesser power is a desirable factor.
In VLSI system design the power consumed by switching activities of various gates
given by equation P=CV²f as mentioned in 1.15 has to be analyzed and optimized for low
power implementation. By minimizing any of parameters either v-supply voltage,
capacitance (diffusion and depletion due to various components of CMOS), frequency of
clock of each subsystem we can reduce the power consumption. For example: - If we can
minimize switching activity F and C factors of switching power can be reduced.
Optimizing circuit also we can minimize Capacitance and hence overall power. In our
thesis logic level optimization has been tried in behavior code of forty one 8-bit sorting
networks.
24
1.5. Motivation:
In image processing noise removal plays a vital role as it eases the object recognization
and object identification problem. For this purpose median filter is one of the noise
suppressing filters which is most commonly used. Median filters are implemented by
software in most of the systems .When real time image processing applications, like robot
visions and visual feedback control are concerned, the minimum latency in the system’s
response is of paramount importance, making software approaches un acceptable. How
ever it is computationally expensive and hence consumes a lot of power.
There has been tremendous effort on sorting network design and comparator design at
circuit level [1, 3]. VLSI median filters have been developed using both digital and
analog technologies. There has been a number of application based on median filter in the
field of image processing and remote sensing such as medical imaging, satellite imaging,
Geographic image surveillance, Seismographic analysis. In image processing field as it is
the most commonly used impulsive or spike noise removal filter; so on chip median filter
is now days very cost effective. Therefore a lot of VLSI design both Analog and Digital
have been implemented and there have been a lot of efforts to build faster and faster
sorters.
The fast growing demand for System on Chip (SOC) design requires much more
developed but complicated techniques. But there have been heavy power consumptions
and dissipations supporting this type of application. So currently there is a heavy demand
for reducing this power factors with same functioning capabilities. But in that context
battery or power optimizing technology has not grown up to that mark. Most of these
products include embedded microprocessors, DSPs and ASICs. Therefore low power
design of any VLSI design is now a days a challenging task.
For low power design there are various architecture level design like systolic, bit level
design [4-6] have been proposed which are faster in design but in terms of power there
have been a little concern . For low power application different level of optimization has
been done in VLSI design process as discussed in section 1.2.
25
1.6 Chapter-wise Organization of the thesis:
The rest of thesis is organized as follows:
(1)Chapter-2: This chapter briefly introduces with the related works of hardwire design
median filter, sorting network and their low power designs.
(2) Chapter-3: This chapter presents the design of a 16 bit general processor.
(3) Chapter-4: This chapter presents a design of selective median filter and its low
power implementation
(4) Chapter-5: In this chapter conclusion has been made and some further research
scopes are suggested.
26
Chapter 2
Literature Survey
A double derivative impulsive noise detector has been proposed in [2].In this paper a
double differential filter has been applied on each target pixel to check whether that pixel
is a noisy one or not. Usually applying double derivative filter restores the sharp edges
and high pixel values in the remaining image. Using a 3 x 5 window double derivative or
Laplacian filter is applied and the filtered value is compared with threshold value. If it is
less then median filter is applied. It concludes that the test pixel is of high intensity value
and hence there is less chance of corruption. The complete mechanism has been given in
figure 2.1.
Median filter
Next Pixel
Depending upon threshold value the result of designed filter varies. Hence this
threshold value can be selected in such a way that it will be suitable to each image
depending upon their intensity or pixel values of image and so more optimized
result can be possible.
27
2.1. A Survey on Hardware implementation for Median Value
Evaluation
Mustafa Karaman et al [3] present a general purpose median filter in the form of two
single-chip median filters: one for extensible and one real time. They have found
experimentally that the exact median of elements, in a window size w = 9 with arbitrary
word length L, can be found by using only one extensible median filter chip. This can be
also extended to any length of L. For w > 9 with arbitrarily L the number of chips
required to find the exact medians is no more than the smallest greater integer of (w/9)
².Their simulation showed on a 40 MHz chip the filter can filter 30/L mega medians per
second. On the other side real-time median filter with fixed size w=9 and L=8 can
generate 50 mega median per second on 50MHz clock.
In this work an odd/even transposition sorting network has been presented which is
a pipelined modular structure consisting of w compare and swap stages. The extensible
median filter consists of
28
Each compare and swap unit as shown in figure 2.3 has three states such as reset, swap or
pass and equal/not equal. Two bits A i and Bi from two different input sets are taken and
compared and if A i > Bi then pass the inputs unaltered to next level, if A i < B i then
swap the values and pass to next stage. By doing at each clock cycle after initial delay we
will get median output as shown in sorting network (figure 2.2).
R: Reset
S S: Swap/pass
E E: equal/not equal
29
Hideo Yamasaki et al in [4] presented a low power and high speed mixed signal VLSI
median filter that has used majority voting circuits. In this design an eight-bit 41-input
median filter circuit was designed and fabricated 0.35 µm technology. It has used Binary
search algorithm which is a bit-comparison based technique and it is compatible to direct
hardware implementation.
The mechanism that has been used is given below:
Majority Voting Technique:-
Example:-
Above example explains the median detection by the binary search algorithm taking the
search from five 4-b input data as an example. In the first place, majority voting for the
MSB’s of all data is carried out. Namely, input data are divided into two groups, one with
MSB of “0” and the other with MSB of “1”, and the majority group wins. Since the
median value is necessarily in the winner group, the MSB of the median is identical to
the winner bit (“1” in this example). Secondly, the remaining bits of the data in the loser
group are all converted to the loser MSB (“0” in this case) in order to propagate the
information that losers are all smaller (or larger) than the median value.(In the present
example, all the data in the losers group are converted to “0000”.) Then the majority
voting for the second bits is carried out in a similar way, and the winner bit yields the
30
IN0 [7] IN1 [7] IN40 [7]
LC LC LC
MVC
LC LC LC
MVC
LC LC LC
MVC
31
second bit of the median. The procedure is continued down to LSB’s. In this manner, the
median value is determined by repeating the majority voting with simple logic control.
The circuit organization for majority voting circuit of forty one 8-bit data has been
presented in figure 2.4.Every bit of the 41 data is fed to the circuit simultaneously and the
median value is calculated asynchronously. LC unit are for logic control which turns the
the data of loser bit into corresponding winner bit as described in the algorithm. The
MVC unit has been implemented using floating gate technology as shown in figure 2.5.
Majority
Buffer
Reset MOS
32
IN0 IN1 IN40 Majority
Their experimental work shows 74% reduction in power as compared to their previous
work.
Krishna J. Palaniswamy et al has designed an 8-bit VHDL based 2-D median filter using
Mentor Graphics tools. The algorithm is based on sorting mechanism that was proposed
in [1].The VHDL code has been written, synthesized and optimized for an IC layout
using CMOS 2–µm technology. This algorithm is implemented in MATLAB and then
synthesized in CAD tools. This paper is a standard reference for building block for the
development of post processing Application Specific Circuits (ASIC) based system for
video signals. The main motivation of this project is to combine the design
methodologies and environments for digital signal processing architectures and
applications with VLSI design.
The mathematical model for sorting:
A comparator and swap unit is used as in [1] .First starting from left number pairing has
been done. Here there are odd numbers of values and hence one number has been
remained atlast. Again after each step of comparing the pairing starts from right most side
.This has been clearly shown in example mentioned below. Each compare and swap stage
compares the values and the larger value occupies left most side of block and the smaller
in the right side.
33
125 49 38 81 102
125 49 81 38 102
125 49 81 38 102
125 81 49 102 38
125 81 49 102 38
125 81 102 49 38
125 81 102 49 38
125 102 81 49 38
125 102 81 49 38
125 102 81 49 38
125 102 81 49 38
125 102 81 49 38
34
Krishna et al have proposed a median filter [7] which has used a fast memory with multi
ported enable signals. This was also implemented in VHDL in Mentor Graphic platform.
This special RAM cell has used ass transistor logic. The previous 1-d median filter has
been extended to two dimensional architecture using registers as shown in
figure
Input block
1-d Median
Register file filter
VHDL implementation shows the control register file has been handled by address, read,
write command and all these signals are synchronized by single clock.
address
REGISTE
WR R FILE 8 bit data
RD
6 bit data
35
In paper [12] FPGA implementation median filter has been described .This paper gives
the algorithm and implementations details of a sliding real time 3x3 median filter. The
design is implemented in a XILINX XC4010 FPGA
chip.
X
X
X1
X X
X
X
X X
X
X X
On each clock cycle window of size 3 X 3 moves on each pixel and only one column has
to be slided as other two columns are same and hence using parallel-serial input scheme it
can be used only vertical sorting. The result shows a few number of gates and flip flops
compared to [1] with same throughput.
A combined VLSI architecture for non linear image filters has been presented in
paper [9] .The types of filters consist of weighted order statistics, stack, nonlinear mean,
Teager, polynomial and rational filters. The VLSI architecture has been in given in
figure. This architecture consists of memory, registers, the individual filters, a
multiplexer, control lines, a normalizer as well. The memory contains the data taken from
image and the multiplexer is used to select any six of these filters. This is a
multifunctional image processing solution for applications that requires the output of
nonlinear filters such edge preservation smoothing, noise filtering and image
segmentation.
36
8-bit
WOS
Register
Stack
Memory Nine
Nonlinear mean
8-bit
registers
Teager
Normalizer
Normalizer
Polynomial
Rational
A VLSI rank Order filtering using DCRAM architecture addresses a VLSI design of rank
order filtering (ROF) technique adapting a mask able memory for real time speech and
image processing.Meng chun Li et al have proposed a bit wise sorting based algorithm as
described in [3] with a special defined memory called Dual Cell Random Access
Memory(DCRAM). Meng et al have designed a complete 16 bit image processor having
architecture as shown in figure 2.11.The main component of this architecture is a
dedicated memory whose structural model has been shown in figure given below. The
mechanism consists of three main steps namely i) bit counting ii) Threshold
decomposition iii) Polarization. As followed in the MVC circuit the scan starts from
MSB and bit counting is done to calculate the major bit. This is also called as threshold
decomposition. Depending upon winner bit other bits of looser data set has been modified
.This step is called polarization. This is very much similar to MVC technique [1].Here
these three steps are implemented in pipelined manner as the DCRAM can perform the
bit-sliced read which can read data from memory, partial write, computing the majority
bit. The bit-sliced read and partial write are driven by mask able registers. With recursive
37
execution of the bit-slicing read and partial write, the DCRAM can effectively realize
ROF in terms of cost and speed
Instruction Decoder
addr Wr CP
WMR RMR
Other
design
DCRAM Blocks
Address Data Field Computing
Decoder Field
Output Data
Different techniques for low power VLSI design have been illustrated in [10, 11].The
different levels of abstraction for low power solution have been provided elaborately
in[15]. A pass transistor logic based design of a 32 bit adder has been illustrated in [15].A
digital system deigns of combinational and sequential circuits have been given in detail in
[16].Basics of VHDL language has been given in VHDL Primer [17] and Various system
level design (sequential and combinational) A CPU design and its behavior model has
been given in “VHDL Programming by Example” by Douglas Perry [18] and “Digital
System Design” by Tomas Lang et al.
38
2.2. Performance Evaluation
The resulted images are taken from MVC logic [4].
Figure 2.12 Noisy LENA Image Figure 2.13 Filtered Image by MVC [4]
The implementation result based on voltage level Floating Gate based majority voting
circuit [3] has been presented and compared with previous design of general purposed
median filter [3].
Table 2.1
Logic operation Technology Latency Chip area Power
MVC[1] Mixed signal, 0.35µm 12ns 0.51mm² 72.6mW
Current
39
Chapter 3
Processor Design
There are generally two types of microprocessors: general purposed microprocessors and
dedicated microprocessors. General purposed Microprocessor such as Intel Pentium
works under the control of software or operating system. On the other hand dedicated
microprocessors are less complicated and specially built for certain tasks. They are most
commonly known as Application Specific Integrated Circuits (ASIC).The digital circuit
for microprocessor is called as Central Processing Unit. The logic circuit for CPU is of
two types: data path and control path circuits.
The general purpose microprocessor process consists of building different blocks or
components of microprocessor like ALU, Control unit, instruction register and memory
unit with common data bus. Now the data path for various instructions set is designed and
it is checked whether the instruction set is working efficiently under control unit. There
are various components which are controlled by signals generated from control unit
which is based on purely sequential circuits (Finite State Machines).
Here a simple memory mapped processor as shown in figure 3.1 has been
designed and its operation based on 16-bit instruction set has been shown in figure 3.2
drawn below.
Clock Reset
VMA
Ready
Addr MEM
CPU
Data
40
3.1. Instruction set for above CPU
Double Word
41
OpRegSel
InstrReg
Reg0 OpReg
Comp
Reg1
Reg3 InstrSel
Reg4 ShiftSel
Shifter
Reg5 Compsel
Reg6 OutRegSel
Reg7 OutReg
RegSel
ProgSel
Clock
ProgCnt
Control
Reset
AddrSel
AddrReg
Ready RW VMA
42
Designed a general processor having components as follows
This design has been designed and synthesized using VHDL for Spartan 3 FPGA kit
(XC3S250).The generated test bench waveform has been shown in figure
43
Figure 3.4 Snapshot of Test bench Waveform-1 for CPU Simulation
44
Figure 3.5 Snapshot of Test bench Waveform-2 for CPU Simulation
45
Chapter 4
Here our designed filter has been implemented using various low power techniques at
logic level and algorithmic level of majority sorting technique as mentioned in [4].
Different logics have been tried while designing the sorting circuit. In the following
section the double derivative mechanism and those majority bit evaluation scheme has
been discussed.
A standard image of LENA of size 250 x 250 is taken and the above selective median
filter is applied and tested.
In 1-D and 2-D standard median filtering applications a window of size w ;where w is an
odd size is usually taken as 3 x 3 or 5 x 5.
Y
(x, y)
3 x 3 windows
Image f(x, y)
46
Median value => F(x11, x12, x13, x21, x22, x23, x31, x32, x33)= median value of sorted
values =>SORT(x11, x12, x13, x21, x22, x23, x31, x32, x33)
x11 x12 x13 x14 x15 x’11 x’12 x’13 x’14 x’’11 x’’12 x’’13
x21 x22 x23 x24 x25 => x’21 x’22 x’23 x’24 => x’’11 x’’12 x’’13
x31 x32 x33 x34 x35 x’31 x’32 x’33 x’34 x’’11 x’’12 x’’13
Here x22 is the target pixel so instead of taking 3 x 5 filter into a memory only
five values are taken and difference of their values is calculated.
Where
and
47
4.3. Simple memory mapped Control Unit design for median filter
RAM
IPU
CU
MVC
48
Figure 4.3- MATLAB User Defined function blocks for Median filter
49
Figure 4.4 (a) Original Image (b) Image corrupted by 10% salt& Pepper noise (c) Filtered
Image
Figure 4.5 Image corrupted by 50% salt and pepper noise and filtered image
50
As shown in figure 4.4 and 4.5 the standard Lena image (250 x 250) has been taken and
added by 10 % and 50% salt and pepper impulsive noise Using MATLAB 7.1 and then
filtered by selective median filter one by one.
x1 In1
x2 In2
R u
x3 In3
EYES.png G
B x4 In4
0
Image From File
hold x5 In5 Out1
Out
Gateway Out
x6 In6
Display
x7 In7
30 i
Constant x8 In8
x9 In9
Subsystem
ROM Unit
Sy stem
Generator
Figure 4.6: Simulink Block Diagram Of median calculator of nine pixel values
51
System Generator (XILINX block set) : This block is for the control of system and
simulation
1 In
In1 Gateway In
ip1
2
In
In2
Gateway In1 ip2
3
In
In3 ip3
Gateway In2
In4
In ip4
4
Gateway In3
1
ip5 op1
Out1
5 In
In5 Gateway In4
ip6
6
In
In6 ip7
Gateway In6
7 In
Gateway In7 ip8
In7
8
In8 In ip9
Gateway In8
9 Black Box
In9 In
Gateway In5
Figure 4.7 Diagram of median value evaluation Subsystem using XILINX Block set
Black box (XILINX block set): It implements the Majority voting circuit in VHDL.
Gateway in and Gateway out (XILINX block set): These are used for type conversion
from MATLAB type data types to XILINX type.
52
4.4. Results
Here the power report of our various designs has been given. In table 4.4.1 the total
power consumption of 41 8-bit sorting circuits has been given. and the simulation was
done in SYNOPSIS. The same design has produced power report in XPOWER
synthesized in VIRTEX 5 as given in 4.8. A sorting circuit has been designed using forty
one 8–bit data values using MVC logic as mentioned in [4].This design further re-
implemented using Logic 1 and Logic 2 as described in section 4.3 and the power report
produced by those design has been shown in table 4.4.2 and 4.4.3. The same code has
been re-implemented using nine 8-bit input values in XC5vl250t-3ff1136 and on further
applying Logic 1, Logic 2, and Logic 3 shows optimized power report as given in
respective tables 4.4.4, 4.4.5 and 4.4.6.
53
Table 4.4.3 Power summary of sorting circuit of 41 pixel values using logic-2
54
Chapter 5
5.1. Conclusion
The above results confirms that various logic level and algorithm level optimization can
enhance the power efficient design process. The table 5.1 shows the comparison between
sorting circuit of MVC [4] and our work. This result has been taken from power reports
and design reports of VHDL design of sorting circuit of forty one 8-bit values
implemented in SYNOPSIS. This selective median filter is definitely effective because it
saves some of extra computational work and it is also fast because it is based on
hardwired logic. But the area report of the design shows increments in area and number
of gates or cells is higher in comparison to others. Other factor is delay or latency which
is shown in table is not an optimized value for clock delay and pin to pin delay. It has
only shown the switching delay; although there are various timing analysis report
generators available in various CAD tools. Our simulation has been completely based on
logic simulation of sorting net work for 3x3 windows.
55
References
Lumpur, Malaysia.
a General purpose Median Filter Unit in CMOS VLSI,” IEEE J. Solid-State Ciruits,Vol.
4. Hideo Yamasaki and Tadashi Shibata,”A High Speed Median Filter VLSI Using
ESSCIRC,Grenoble,France,IEEE ,2005
5. Long-Wen Chang And Jing-Ho Lin,”Bit level systolic arrays for real time median
filters”,IEEE 1990
7. Meng-Chun Lin,Lan-Rong Dung, ”On VLSI Design of rank order filtering using
9. Gary Yeap,” Low power VLSI digital design”, Kluwer Academic Pubblishers, 1st
ed.,1998.
56
10. A. Bellaouar and M. I. Elmasry,”Low power digital VLSI design circuits and
systems”. 2nd ed, 1st Edn., Kluwer Academic Publishers, 1995.
14. Reto Zimmermann and Wolfgang Fichtner,”Low power logic styles: CMOS Versus
Pass –Transistor Logic”, IEEE Jornal of solid dtate circuits,VOL. 32,NO. 7, July 1997
17. J. Bhaskar, “VHDL Primer”, 3rd edition, Prentice Hall PTR, 1999
57
58
Dissemination of Work
58