An FPGA-Based Fully Synchronized Design of A Bilateral Filter For Real-Time Image Denoising
An FPGA-Based Fully Synchronized Design of A Bilateral Filter For Real-Time Image Denoising
An FPGA-Based Fully Synchronized Design of A Bilateral Filter For Real-Time Image Denoising
Abstract—In this paper, a detailed description of a synchronous these parameters is very intuitive, which leverages the bilateral
field-programmable gate array implementation of a bilateral filter filter to an almost all-purpose solution in image processing.
for image processing is given. The bilateral filter is chosen for one The authors of [2] and [3] show that noise filtering, despite
unique reason: It reduces noise while preserving details. The de-
sign is described on register-transfer level. The distinctive feature the prevailing view, not always implies resolution reduction
of our design concept consists of changing the clock domain in but can even be used to sharpen the edges [2] or to enhance
a manner that kernel-based processing is possible, which means the flowlike structures [3]. In [4], the motion-adaptive bilateral
the processing of the entire filter window at one pixel clock cycle. filter is used for quality improvement in low bit rate video
This feature of the kernel-based design is supported by the ar- coding. Also, in [5], the bilateral filter is applied for noise
rangement of the input data into groups so that the internal clock
of the design is a multiple of the pixel clock given by a targeted reduction in a method for local tone mapping which maps high
system. Additionally, by the exploitation of the separability and dynamic range image to low dynamic range image.
the symmetry of one filter component, the complexity of the design Recently, bilateral filtering has gained a high awareness
is widely reduced. Combining these features, the bilateral filter is level in medical image processing and nondestructive testing.
implemented as a highly parallelized pipeline structure with very The authors of [6] studied the impact of noise reduction by
economical and effective utilization of dedicated resources. Due to
the modularity of the filter design, kernels of different sizes can be the bilateral filter applied to the reconstructed images. They
implemented with low effort using our design and given instruc- concluded that the images processed with this filter show a
tions for scaling. As the original form of the bilateral filter with significant improvement in image quality compared to their
no approximations or modifications is implemented, the resulting unfiltered counterparts. In [7], the authors discuss the results of
image quality depends on the chosen filter parameters only. Due noise reduction by the bilateral filter in projection space. This
to the quantization of the filter coefficients, only negligible quality
loss is introduced. means that the noise filtering takes place prior to computing the
reconstructed volume. It has been concluded that noise reduc-
Index Terms—Bilateral filter, field-programmable gate array tion of this kind can be translated into a dose reduction in X-ray
(FPGA), image processing, noise reduction, real-time processing.
computed tomography. Considering industrial applications, the
dose reduction permits the reduction of the scanning time and
I. I NTRODUCTION thus allows a higher throughput of test items.
Our own experiments and studies shown in [8] and [9]
B ILATERAL filtering has gained great popularity in image
processing due to its capability of reducing noise while
preserving the structural information of an image. The bilateral
confirm the possible dosis reduction. As the reduction of the
exposure time due to filtering is feasible, we are interested
filter [1] consists of two components. The detail-preserving in a real-time filtering of projections. Moreover, the filter is
property of the filter is mainly caused by the nonlinear filter not supposed to reduce the spatial resolution of projections to
component also called photometric filter. It selects the pixels of maintain the visibility of defects in a reconstruction. Since we
similar intensity which are averaged by the linear component achieve very satisfying results considering detail preservation
afterward. Very often, the linear component is formulated as with our field-programmable gate array (FPGA) implementa-
a low-pass filter. The amount of noise reduction via selective tion presented in [10], we intend to give a deeper insight in our
averaging and the amount of the blurring via low-pass filtering work.
are both adjusted by two parameters. The understanding of The major contribution of this paper is the detailed descrip-
tion of a novel FPGA design architecture of the bilateral filter
on register-transfer level (RTL). This abstraction level is chosen
for the possibility of direct specification of the clocking scheme
Manuscript received March 5, 2012; revised August 6, 2012 and October 24, [11]. The main advantages of this design are the capability of
2012; accepted December 6, 2012. Date of publication October 25, 2013; date
of current version February 7, 2014.
real-time processing and economical and effective utilization of
A. Gabiger-Rose, R. Weigel, and R. Rose are with the Institute for Elec- resources through the following.
tronics Engineering, Friedrich-Alexander University of Erlangen-Nuremberg,
91058 Erlangen, Germany (e-mail: anna.gabiger-rose@fau.de; robert.weigel@ 1) Sorting the data into equal groups to which separate
fau.de; richard.rose@fau.de). pipelines are assigned.
M. Kube is with the Department of Contactless Test and Measuring Systems, 2) Raising the internal clock frequency according to the data
Fraunhofer Institute for Integrated Circuits, 91058 Erlangen, Germany (e-mail:
matthias.kube@iis.fraunhofer.de). flow.
Digital Object Identifier 10.1109/TIE.2013.2284133 3) No external image buffer is necessary.
0278-0046 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
4094 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014
Moreover, due to the modularity of the design, it can be filter acceleration approach discussed so far is the high amount
extended to implement arbitrary kernel size with low effort. The of memory required for the implementation.
instructions required for this can be found later in this paper. Instead of a piecewise-linear approximation and subsam-
The remainder of this paper is organized as follows. In pling, the idea of utilizing a histogram-based approach for
Section II, we consider the related work. After a short descrip- accelerating the filter is presented in [18] and [19]. The main
tion of the bilateral filter in Section III, we give a detailed difference between these two works is that, in [18], a hierarchy
description of our FPGA design in Section IV. Section IV is of partial distributed histograms on multiple tiers is computed
the main part of this paper presenting the filter design stage and adjusted for each output pixel while the author of [19]
by stage. In Section V, the criteria applied to the evaluation of calculates the integral histogram of the image and extracts the
the image quality prior and after the noise filtering are detailed. histogram for each target filter window to obtain one output
After that, in Section VI, the results are discussed, and the pixel. These methods both are fast, but a real-time performance
performance potential of our filter design is analyzed. of the histogram-based approach in [19] can only be achieved
by very-large-scale-integration design of the filter shown in
[20]. The memory demand of the histogram-based acceleration
II. R ELATED W ORK
method is also high but is lower than that of the piecewise-linear
Since the bilateral filter is in widespread use, a lot of effort approximation and subsampling approach.
has been put into acceleration for use in practical applications. The aforementioned examples show that a filter modification
Mainly, among the publications concerning speeding up of technique reaches real-time performance only if its imple-
the bilateral filtering, two trends can be stated. One stream mentation utilizes hardware acceleration. Most of the referred
is focused on the modification of the filtering components, works rely on GPUs for acceleration. However, in fields of
resulting in an efficient algorithm. Another trend is to accelerate applications in which high power efficiency is crucial, an FPGA
the filtering through parallelizing the algorithm or through solution is preferable. In [21], an algorithm for the denoising of
hardware acceleration, including modifications of the filter at medical images is implemented on an FPGA and four different
the same time. GPUs. The authors show that the power consumption of their
In [12], a fast approximation of the original bilateral filter FPGA implementation is always significantly lower. Further-
is proposed. Here, the 2-D filtering is separated into two 1-D more, the authors of [21] point out that an FPGA implementa-
operations performing 1-D bilateral filtering in one arbitrary tion allows to count latency in image lines, resulting in delays
dimension and filtering the intermediate result in the same lower than one frame, while the latency on a GPU is always
manner in the subsequent dimension. The authors report that one frame. This is relevant for many medical applications which
the proportionality of the execution time to the number of demand fast image output to supply interactive operations.
filter dimensions decreases from exponential to linear. This The authors of [22] also choose an FPGA implementation
approach requires a little memory overhead but results in a for their image processing system because moving time-critical
filter which is fast enough to be used for preprocessing in video functionalities, like the edge detection in an image, to hardware
compression systems. However, as the photometric component platforms makes it possible to keep delays in the control loop
of the bilateral filter is not separable, the image resulting from to a minimum. The authors of [23] and [24] report excellent
the modified filter is documented to be slightly different from experience of using FPGAs for motion control of robots based
the image produced by the original filter. on real-time image processing. The main reason for using
Another acceleration approach proposed in [13] has given a FPGAs for real-time robotics tasks is the ability of FPGAs to
basis for numerous extensive works. This approach provides a satisfy the requirement for high computational power and data
numerical scheme for speeding up the filtering via a piecewise- throughput [24]. Moreover, FPGA solutions offer additional
linear approximation of the bilateral filter in the intensity do- advantages, such as reconfigurability and portability.
main and substituting the low-pass filtering by downsampling. However, considering complexity and timing constraints of
In [14], this technique is extended by transposing the computa- the algorithm to be implemented, the suitability of the chosen
tion to a 3-D space presenting the image intensity as a third hardware platform has to be checked [25]. A DSP implemen-
dimension over the 2-D image coordinate space. After that, tation has been regarded to be more appropriate for complex
the authors of [15] formulated the concept of the bilateral grid algorithms with high data dependence. For algorithms with
and implemented the bilateral filter using the proposed data low data dependence and high timing constraints, an FPGA
structure on three different graphics processing units (GPUs). solution is more suitable. The authors of [25] discuss in detail
Not until then, by means of their hardware acceleration, a the advantages of using FPGAs even if the algorithm shows
processing with 30 fps is possible which they assign as real- both high complexity and timing constraints. At the same time,
time performance. Later, the technique proposed in [13] was the authors of [26] emphasize in their conclusion that FPGA-
also implemented on a GPU by the authors of [16] and is based digital processing systems achieve better performance, at
also capable of the real-time processing with the same frame a lower cost, than traditional solutions based on DSPs.
rate. More recently, the lazy sliding window implementation Furthermore, the parallel architecture of the FPGA provides
of the approach in [13] was proposed in [17]. This method an excellent platform for the implementation of paralleled and
is suitable for single-instruction-multiple-data-type processors pipelined structures. This conclusion is made by many authors.
like DSPs. In this case, the speedup also allows applications Therefore, implementing an algorithm for color image segmen-
requiring real-time performance. The main drawback of the tation for object detection in full parallelism on an FPGA, the
GABIGER-ROSE et al.: FPGA-BASED FULLY SYNCHRONIZED DESIGN OF BILATERAL FILTER 4095
authors of [27] report a drastic improvement of the speed of averaging of similar pixel values only, regardless of their po-
segmentation compared with the sequential-code-based seg- sition in the filter window. If the value of a pixel in the filter
mentation. In [28], a design of a fully pipelined data path for window diverges from the value of the pixel being filtered by a
real-time face detection using FPGA is described which sup- certain amount, the pixel is skipped.
ports high-speed detection irrespective of the number of faces Taking Gaussian noise into account, the shift-variant filtering
in an image. The authors of [29] implement their paralleled and operation of the bilateral filter is given by
fully pipelined hardware for real-time electromagnetic transient
1
simulation on an FPGA and thereby solve a challenging prob- φ̄(m̄0 ) = φ(m) · s (φ(m0 ), φ(m)) · c(m0 , m).
k(m0 )
lem of implementation of the complex simulation models. m∈F
There are several publications dealing with FPGA implemen- (1)
tations of the bilateral filter. In [30], one of these designs is The term m = (m, n) denotes the pixel coordinates in the
presented. The verilog hardware description language (VHDL) image to be filtered and m0 = (m0 , n0 ) and m̄0 = (m̄0 , n̄0 )
code of this design is generated automatically from the mod- represent the coordinates of the centered pixel in the noisy and
els for FPGA synthesis using System Generator from Xilinx. in the filtered images, respectively. With these notations, φ̄(m̄0 )
Although the optimization setting for the code generation was means the gray value of the pixel being filtered, and φ(m)
for maximum clock frequency, the authors admit that the speed identifies the gray value of the spatially neighboring pixels to
of their implementation for a 15 × 15 pixel filter kernel is φ(m0 ) in the filter window F .
insufficient for a real-time application. The authors of [31] The following expressions (2) and (3) describe the photo-
compared a VHDL and a high-level synthesis (HLS) descrip- metric and the geometric components s(φ(m0 ), φ(m)) and
tion, created by System Generator, of an adaptive impulse noise c(m0 , m), respectively:
filter and concluded that higher speed of the system clock can 2
be achieved using VHDL description. Thus, these publications 1 φ(m0 ) − φ(m)
s (φ(m0 ), φ(m)) = exp − (2)
show exemplarily that the handcrafted optimization of an FPGA 2 σph
design regarding both the operating frequency and the resource 2
utilization is still irreplaceable. 1 m0 − m
A different approach for the FPGA implementation of a real- c(m0 , m) = exp − (3)
2 σc
time bilateral filter has been proposed in [32]. The modified
filter is based on the calculation of the filter coefficients from where parameters σph and σc regulate the width of the Gaussian
the photometric filter only. The spatial filtering is eliminated curve assigned to s(φ(m0 ), φ(m)) and c(m0 , m), respectively.
due to the processing of the minimal window of 3 × 3 and The photometric component compares the gray value of the
raising of the derived photometric coefficients to the power of centered pixel with the gray values of the spatial neighborhood
8. According to the authors, for a moderate noise level, their and computes the corresponding weight coefficients depending
modified bilateral filter can achieve slightly better results com- on the factor σph . The more the absolute difference of the
pared to the traditional bilateral filter shown in [1]. However, gray values exceeds σph , the lower is the corresponding filter
the original bilateral filter can be tuned by two parameters coefficient and vice versa. The domain filter c(m0 , m) acts as
which are highly responsible for the filtering performance. a standard low-pass filter, the weights of which are reciprocally
Unfortunately, no description of the parameters used for this proportional to the spatial distance of the centered pixel to the
comparison is given in [32]. pixels in the neighborhood.
The work published in [33] is most related to our work. Normalization with
The major parallel to our design consists in implementing the
bilateral filter on an FPGA without any modification. This k(m0 ) = s (φ(m0 ), φ(m)) · c(m0 , m) (4)
approach is sometimes called brute-force method. However, the m∈F
main difference to our work is that the authors developed their
guarantees that the range of the filtered images does not change
design using an HLS tool. The resulting architecture presents a
significantly due to the filtering. Owing to the fact that the
3 × 3 filter kernel. In contrast, our design is based on an RTL
coefficients of the photometric component cannot be computed
description and presents a 5 × 5 filter kernel. Our design allows
in advance, the division by the normalization factor cannot be
high clock frequency and high data throughput and shows only
avoided by means of prescaling of the filter coefficients.
a slight increase of resource demand considering the larger
kernel. From this follows that our architecture utilizes hardware
resources more efficiently and more economically. IV. D ESIGN C ONCEPT
The image data, as well as all constants and coefficients
used in the following design concept, are integer numbers. As
III. B ILATERAL F ILTER
discussed in Section VI, there is no need to implement floating-
The bilateral filter [1] embodies the idea of a combination point computation. With the aid of the presented design con-
of domain and range filtering. The domain filter averages the cept, the bilateral filter can be realized as a highly parallelized
nearby pixel values and acts thereby as a low-pass filter. The pipeline structure giving great importance to the effective re-
range filter stands for the nonlinear component and plays an source utilization. In this paper, the data paths are detailed. The
important part in edge preserving. This component allows description of the control signals is not addressed here.
4096 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014
Fig. 2. Principle of the input data retrieval for the image filtering.
B. Photometric Component
After the register matrix has been filled, the grouped image
data are provided to the photometric filter component which
is pictured in Fig. 4. At the output of the photometric filter, the
weighted pixels appear, still sorted into groups, accompanied by
the “weighted mid_pix.” Additionally, the photometric coeffi-
cients have to be forwarded for the required normalization at the
last stage of the filtering according to (4). Thus, in parallel to the
pixels, the photometric coefficients also have to be processed by
the geometric filter in order to obtain the normalization factor
defined in (4). For this reason, the output of the photometric
filter consists of the following:
1) weighted pixels sorted into groups 0 . . . 5;
2) the weighted pixel being filtered, marked by “mid_pix”;
3) photometric coefficients corresponding to groups 0 . . . 5.
In further stages of the design, the weighted pixel values, i.e., Fig. 6. Processing order of input data in the photometric filter component.
the outputs of the multipliers, are named by their groups 0 . . . 5.
A detailed functional flow block diagram of the photometric The way of arranging and the processing order of the input
filter is shown in Fig. 5. The pixel in the center of the filter data of the photometric component are shown in Fig. 6. At the
window has to be available during the calculation of the re- first internal clock event t0 , the first pixels of each group are
quired 24 pixel weights. Latching the centered pixel allows the provided to the respective pipeline. At the second internal clock
computation of the gray value differences between the centered t1 , the second pixels of each group enter the component. This
pixel and the remaining pixels inside of the filter window. Each organization of groups allows the processing of the whole filter
group contains four pixels. A separate pipeline belonging to window in four internal clock cycles corresponding to one pixel
each group makes it possible to process the entire neighborhood cycle. In the upper part of Fig. 5, the processing path for the
of “mid_pix” at one pixel clock signal. All six pipelines are group 0 is shown; in the lower part, there is the processing path
designed identically. for the group 5.
4098 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014
Fig. 9. Vertical part of the geometric filter component. Fig. 10. Horizontal part of the geometric filter component.
or 8 pixels at the same distance from the centered pixel. For of the geometric coefficients, it is assured that the accumulation
the simplicity of the design, it makes sense to assemble the does not result in a carry. The registers “REGcol 0,1,2” in this
pixels into equally large groups. Smaller groups allow for better part of the design are used to delay weighted data to maintain
handling of the design. For this reason, the pixels are divided synchronicity. After the multiplication, the weighted values are
into groups of four with regard to the subsequent processing summed up by the adder tree to one value at each internal clock
explained in the following sections. After the accumulation of event.
the pixels according to their symmetry, the sum is multiplied The processing of the centered column is detailed in the
by the corresponding coefficient. The horizontal processing is lower part of Fig. 9. The centered pixel is weighted and delayed
done in the same way. by “REGcen” so that this pixel and the remaining pixels in the
The coefficients for the geometric component are scaled in centered column can be fed to the input of the adder tree simul-
such a manner that the sum of the vertical coefficients (and taneously. The remaining pixels enter the dedicated processing
the horizontal ones, respectively) is equivalent to the so-called path one by one. They were multiplexed in the register matrix
normalized one [35]. For the signed coefficients with the word in the way that they can be combined pairwise and multiplied
length W, the normalized one is equal to 2W −1 . This means by the same coefficient in the geometric component. In order
that the division of the weighted gray values and photometric to weight the pixels in a proper way, every incoming pixel is
coefficients after geometric filtering can be realized as a simple stored in the register “REGcol mid” so that the subsequently
shift operation. In the last stage, the normalized filtered gray calculated sum is valid every second internal clock event. The
value has to be divided by the normalized product of the photo- multiplexing of the filter coefficients with zeros assures that
metric coefficients. The geometric coefficients are calculated in invalid sums vanish due to the multiplying by zero and do not
advance and stored in a block RAM. falsify the result.
1) Vertical Component Part: The first stage of the geometric As it is shown in Fig. 8, the vertical part of the geometric fil-
component is the vertical part which is pictured in Fig. 9. With ter for the weighting of the photometric coefficients is designed
the aid of Fig. 6, it can be seen that the pixels of the first column identically.
numbered 1, 2, 3, 4, 5 and the first pixel of the middle column 2) Horizontal Component Part: In Fig. 10, the horizontal
numbered 11 enter the vertical component part simultaneously. part of the geometric component is displayed. After processing
For the corresponding photometric coefficients, the same order in the vertical dimension, the filter window is reduced to one
of processing is valid. row, and its elements are computed at one internal clock event
The groups 0, 1, 2, 3, 4, which means all columns with the each. In order to be able to reuse the symmetrical design, the
exception of the centered column, are processed as shown in values of the filtered columns 0, 1, 3, 4 are stored in the shift
the upper part of Fig. 9. The geometrically symmetrical pixels registers according to the order of their reception. The filtered
are cumulated at first and then multiplied by the geometric photometrical coefficients are stored in the same way. Since the
weight coefficient. All coefficients for the geometric filter are content of the shift register in the left part of Fig. 10 is valid
constant for the chosen filter window size. Due to the scaling at every fourth internal clock event, the time domain changes
4100 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014
foperating
D. Normalization finternal = . (7)
ngroup_member
At the final stage, the kernel result has to be normalized by
the norm result as shown in Fig. 11. After the final accumulation According to the internal clock frequency finternal , the counter
of these values, they are both divided by the normalized one has to be adjusted, which generates the “select” signal for the
again. In this manner, the word lengths of the weighted gray multiplexers and the enable signal “EnREG” for the horizontal
values and of the norm are both (W − 1) bits shorter. Finally, part of the geometric component.
after the division, N bits of the final result are forwarded to the
output of the bilateral filter.
V. I MAGE Q UALITY A SSESSMENT
E. Design Scalability To evaluate the performance of the noise reduction and the
accuracy of the detail preservation, criteria for the image quality
In previous paragraphs, we detailed the filter design for the assessment are required. The criteria chosen in this work are
5 × 5 kernel. However, depending on an application, another PSNRdB and MSSIM.
kernel size might be required. For small images, a 3 × 3
window size is more suitable to prevent blurring. Some authors 1) PSNRdB : The well-known peak-signal-to-noise ratio
choose to work with a larger kernel of the size of 11 × PSNRdB in decibels is defined as follows:
11 pixels [36]. Our design can be scaled for different kernel
GVmax
sizes. Starting at the register matrix, it has to be dimensioned PSNRdB = 20 · log10 √ (8)
according to the required kernel size. The kernel size in one MSE
dimension is assigned with K in the following: 1 2
MSE = φref (m) − φ̃(m) (9)
MN
M N
Ngroups = K + 1 (5)
where Ngroups means the number of the pixel groups. The where MSE denotes the mean squared error between the
quantity of the line storages equals K. The number of required image to be compared and the reference image. GVmax
multiplexers equals Ngroups . The multiplexing pattern of the represents the maximum gray value depending on the
pixels remains unchanged for every kernel size. According to word length after the digitalization of the images. The
the symmetry of the kernel, the pixels have to be grouped into noiseless M × N image with gray values φref (m) pro-
Ngroups containing ngroup_member pixels each vides the reference for the measurement of the MSE.
The gray values φ̃(m) originate from the image to be
ngroup_member = K − 1. (6) compared. Considering the quality of the noise filter,
PSNRdB describes the capability of the filter to suppress
The groups are always built up in the manner that each row noise regardless of the perceived visual quality of the
except for the middle pixel forms a pixel group. The middle filtered image.
GABIGER-ROSE et al.: FPGA-BASED FULLY SYNCHRONIZED DESIGN OF BILATERAL FILTER 4101
1
J
MSSIM(Φref , Φ̃) = SSIM vj (φref ), v(φ̃) (11)
J j=1
VI. R ESULTS
After an implementation in Matlab, the proposed architecture
of the bilateral filter was implemented in VHDL and simulated
with ModelSim. A test image was filtered by Matlab imple-
mentation as well as the ModelSim simulation, and the filtered
images were compared. The purpose of this comparison is to
analyze the image quality drop due to the quantization of the
filter coefficients in our FPGA design.
The test image Lighthouse shown in Fig. 12(a) is an 8-b
grayscale image with a size of 512 × 512 pixels. Hence, in the
following, GVmax = 255 is used.
In order to apply the bilateral filter to a color image, the
color data have to be transformed into the CIELab color space
[1]. The structure of the filter remains unchanged. However,
processing of color images is beyond our research interest, so
no results on this topic will be reported.
A. Performance Analysis
For the comparison of the filtering capability between
the Matlab implementation and the ModelSim simulation,
Gaussian noise with standard deviation σnoise = [10, 20, 30, 40,
50, 60] was added to the test image.
In Fig. 12, the test image is contrasted with its noisy coun-
terpart with σnoise = 20 and two filtered images. The filter
parameters σph = 3 · σnoise and σc = 1 were chosen for the
photometric and geometric components, respectively. For filter-
ing in Matlab, no quantization of the filter coefficients was ap-
plied. The corresponding filtered image is shown in Fig. 12(c).
For the simulation with ModelSim, the coefficient word length Fig. 12. (a) Original image. (b) Noisy image with σnoise = 20. (c) Filtering
W = 8 was used. The simulation result is shown in Fig. 12(d). in Matlab. (d) Filtering in ModelSim.
4102 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014
TABLE I
F ILTERING R ESULTS
TABLE II
S YNTHESIS R ESULT
B. Verification
For verification, a Virtex-5 FPGA platform equipped with a
Virtex XC5VLX50-1 device was used. The shortened synthesis
report of the filter design is shown in Table II. A long-term
Fig. 13. Performance comparison of the Matlab implementation and the
ModelSim simulation. trial proved that the design is suitable for real-time processing.
The FPGA board was connected to a camera with a 12-b
resolution depth, generating 30 fps at a full resolution of 1024 ×
Between the Matlab implementation and the ModelSim simu-
1024 pixels.
lation, no visually distinguishable difference can be registered.
Due to the technical specification of the camera, pauses be-
The results of the quantitative comparison between the Mat-
tween the frames are necessary so that 30 fps is the maximally
lab implementation and the ModelSim simulation are con-
achievable frame rate. Thus, the maximal data flow reaches
trasted in Fig. 13 and summarized in Table I. As our recent
approximately 31.5 Mpixel/s. Consequently, we restricted the
research shows, by adjusting σph as a multiple of the measured
clock frequency of our design to 40 MHz in this application.
standard deviation of noise rather than by a single constant,
The internal clock frequency is 160 MHz. With this clock rate,
even better PSNRdB can be achieved. Thus, an optimal setting
a maximal throughput of 38 fps is possible.
for the filter can be chosen which reduces noise and prevents
With a different camera, an even higher frame rate is achiev-
blurring at the same time as far as possible. Exceeding this point
able. Using our FPGA platform, the maximal possible internal
causes oversmoothing, and choosing the adjusting parameter
frequency shown in Table II is 220 MHz. Hence, the maximal
below this point leads to insufficient noise suppression. The
operating frequency of our filter design with the contemplated
discussion of this topic is important but beyond the scope of
FPGA Virtex-5 equals 55 MHz. Considering the image reso-
this paper. For more details, refer to [38].
lution of 1024 × 1024 pixels, the following frame rate can be
Fig. 13 reveals that, for increasing noise levels, PSNRdB and
computed:
MSSIM both increase after noise filtering. For higher standard
deviation of noise, the gain is higher. Using our setting σph = −1
pixels 18.18 ns frames
3 · σnoise , averaging with higher weights is performed for in- (1024 × 1024) · = 52.45 . (12)
frame pixel second
creasing noise levels. Owing to this fact, PSNRdB rises by a
higher amount. MSSIM also increases because the geometrical This calculation is valid only for a throughput of 1 pixel/cycle
component remains narrow, preventing oversmoothing. which is given by our design.
GABIGER-ROSE et al.: FPGA-BASED FULLY SYNCHRONIZED DESIGN OF BILATERAL FILTER 4103
[12] T. Q. Pham and L. J. van Vliet, “Separable bilateral filtering for fast video Anna Gabiger-Rose (S’09) was born in
preprocessing,” in Proc. IEEE ICME, 2005, pp. 1–4. Ordshonikidse, Ukraine, in 1978. She received the
[13] F. Durand and J. Dorsey, “Fast bilateral filtering for the display of high- Dipl.-Ing. degree in electrical engineering, electro-
dynamic-range images,” ACM Trans. Graph., vol. 21, no. 3, pp. 257–266, nics, and information technology from the Friedrich-
Jul. 2002. Alexander University of Erlangen-Nuremberg,
[14] S. Paris and F. Durand, “A fast approximation of the bilateral filter using Erlangen, Germany, in 2007.
a signal processing approach,” in Proc. ECCV, 2006, pp. 568–580. From 2001 to 2007, she was a Student Assistant
[15] J. Chen, S. Paris, and F. Durand, “Real-time edge-aware image processing with the Department of Contactless Test and Mea-
with the bilateral grid,” ACM Trans. Graph., vol. 26, no. 3, pp. 1–9, suring Systems, Fraunhofer Institute for Integrated
Jul. 2007. Circuits, Erlangen. She is currently a Research As-
[16] Q. Yang, K.-H. Tan, and N. Ahuja, “Real-time O(1) bilateral filtering,” in sistant with the Institute for Electronics Engineering,
Proc. IEEE CVPR, 2009, pp. 557–564. University of Erlangen-Nuremberg. Her research interests include the design of
[17] M. M. Bronstein, “Lazy sliding window implementation of the bilateral embedded systems for image processing and the investigation of digital filtering
filter on parallel architectures,” IEEE Trans. Image Process., vol. 20, no. 6, techniques for image quality enhancement.
pp. 1751–1756, Jun. 2011. Mrs. Gabiger-Rose is member of the IEEE Industrial Electronics Society.
[18] B. Weiss, “Fast median and bilateral filtering,” ACM Trans. Graph., She served as a reviewer for the 35th Annual Conference of the IEEE Industrial
vol. 25, no. 3, pp. 519–526, Jul. 2006. Electronics Society (IECON09).
[19] F. Porikli, “Constant time O(1) bilateral filtering,” in Proc. IEEE CVPR,
2008, pp. 1–8.
[20] Y.-C. Tseng, P.-H. Hsu, and T.-S. Chang, “A 124 Mpixels/sec VLSI de-
sign for histogram-based joint bilateral filtering,” in IEEE Trans. Image Matthias Kube was born in Mainz, Germany, in
Process., Nov. 2011, vol. 20, no. 11, pp. 3231–3241. 1975. He received the Dipl.-Ing. FH (M.Sc.) degree
[21] F. Hannig, M. Schmid, J. Teich, and H. Hornegger, “A deeply pipelined in electrical engineering and microelectronics from
and parallel architecture for denoising medical images,” in Proc. IEEE the Georg-Simon-Ohm University of Applied Sci-
FPT, 2010, pp. 485–490. ence of Nuremberg, Nuremberg, Germany, in 2002.
[22] L. Costas, P. Colodrón, J. J. Rodríguez-Andina, J. Fariña, and Since 2003, he has been working as a member of
M.-Y. Chow, “Analysis of two FPGA design methodologies applied to the research staff at the Department of Contactless
an image processing system,” in Proc. IEEE ISIE, 2010, pp. 3040–3044. Test and Measuring Systems, Fraunhofer Institute
[23] N. Sudha and A. R. Mohan, “Hardware-efficient image-based robotic path for Integrated Circuits, Erlangen, Germany. He has
planning in a dynamic environment and its FPGA implementation,” IEEE the technical leadership for the development of an
Trans. Ind. Electron., vol. 58, no. 5, pp. 1907–1920, May 2011. innovative indirect converting X-ray detector with
[24] R. Marin, G. León, R. Wirz, J. Sales, J. M. Claver, P. J. Sanz, and conventional optical sensors for scientific and industrial applications of non-
J. Fernández, “Remote programming of network robots within the UJI in- destructive testing (NDT), which is optimized for tasks that require a high
dustrial robotics telelaboratory: FPGA vision and SNRP network proto- dynamic range, a high speed, and a long life cycle. His interests in research
col,” IEEE Trans. Ind. Electron., vol. 56, no. 12, pp. 4806–4816, Dec. 2009. include optical sensors and cameras, field-programmable-gate-array design,
[25] E. Monmasson and M. N. Cirstea, “FPGA design methodology for in- embedded systems for image processing, and X-ray imaging for NDT.
dustrial control systems—A review,” IEEE Trans. Ind. Electron., vol. 54,
no. 4, pp. 1824–1842, Aug. 2007.
[26] J. J. Rodriguez-Andina, M. J. Moure, and M. D. Valdes, “Features, design Robert Weigel (S’88–M’89–SM’95–F’02) was born
tools, and application domains of FPGAs,” IEEE Trans. Ind. Electron., in Ebermannstadt, Germany, in 1956. He received
vol. 54, no. 4, pp. 1810–1823, Aug. 2007. the Dr.-Ing. and Dr.-Ing.habil. degrees in electrical
[27] H. Zhuang, K.-S. Low, and W.-Y. Yau, “Multichannel pulse-coupled engineering and computer science from the Mu-
neural-network-based color image segmentation for object detection,” nich University of Technology, Munich, Germany, in
IEEE Trans. Ind. Electron., vol. 59, no. 8, pp. 3299–3308, Aug. 2012. 1989 and 1992, respectively.
[28] S. Jin, D. Kim, T. T. Nguyen, D. Kim, M. Kim, and J. W. Jeon, “Design and He was a Research Engineer from 1982 to 1988,
implementation of a pipelined datapath for high-speed face detection using a Senior Research Engineer from 1988 to 1994, and
FPGA,” IEEE Trans. Ind. Informat., vol. 8, no. 1, pp. 158–167, Feb. 2012. a Professor for RF Circuits and Systems from 1994
[29] Y. Chen and V. Dinavahi, “Digital hardware emulation of universal ma- to 1996 with the Munich University of Technology.
chine and universal line models for real-time electromagnetic transient From 1996 to 2002, he was the Director of the
simulation,” IEEE Trans. Ind. Electron., vol. 59, no. 2, pp. 1300–1309, Institute for Communications and Information Engineering, University of Linz,
Feb. 2012. Linz, Austria. Since 2002, he has been the Head of the Institute for Electronics
[30] C. Charoensak and F. Sattar, “FPGA design of a real-time implementation Engineering, University of Erlangen-Nuremberg, Erlangen, Germany.
of dynamic range compression for improving television picture,” in Proc. Dr. Weigel was the recipient of the IEEE Microwave Applications Award in
IEEE ICICS, 2007, pp. 1–5. 2007. Within IEEE Microwave Theory and Techniques Society (MTT-S), he has
[31] A. Rosado-Muñoz, M. Bataller-Mompeán, E. Soria-Olivas, C. Scarante, been the Founder and Chair of the Austrian Communications/Microwave Theory
and J. F. Guerrero-Martínez, “FPGA implementation of an adaptive filter and Techniques Society Joint Chapter and Region 8 Coordinator. He is the Chair
robust to impulsive noise: Two approaches,” IEEE Trans. Ind. Electron., of MTT-2 Microwave Acoustics and the MTT-S President-Elect in 2013.
vol. 58, no. 3, pp. 860–870, Mar. 2011.
[32] T. Q. Vinh, J. H. Park, Y.-C. Kim, and S. H. Hong, “FPGA implementation
of real-time edge-preserving filter for video noise reduction,” in Proc.
IEEE ICCEE, 2008, pp. 611–614. Richard Rose (S’09) was born in Nuremberg,
[33] H. Dutta, F. Hannig, J. Teich, B. Heigl, and H. Hornegger, “A design Germany, in 1981. He received the Dipl.-Ing. degree
methodology for hardware acceleration of adaptive filter algorithms in in electrical engineering, electronics, and informa-
image processing,” in Proc. IEEE ASAP, 2006, pp. 331–340. tion technology from the Friedrich-Alexander Uni-
[34] R. Chen, L. Chen, and L. Chen, “System design consideration for digital versity of Erlangen-Nuremberg, Erlangen, Germany,
wheelchair controller,” IEEE Trans. Ind. Electron., vol. 47, no. 4, pp. 898– in 2007.
907, Aug. 2000. In 2008, he joined the Institute for Electronics
[35] R. Turney, “Two-dimensional linear filtering,” in Application Note: Xilinx Engineering, University of Erlangen-Nuremberg, as
FPGAs, 2007, pp. 1–8. a Research Assistant, and since 2010, he has been the
[36] M. Zhang and B. K. Gunturk, “Multiresolution bilateral filter for image Team Leader of the System Engineering group. His
denoising,” IEEE Trans. Image Process., vol. 17, no. 12, pp. 2324–2333, research interests include digital signal processing,
Dec. 2008. receiver design, antenna design, localization techniques, and wireless commu-
[37] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assess- nication systems.
ment: From error visibility to structural similarity,” IEEE Trans. Image Mr. Rose is a member of the IEEE Microwave Theory and Techniques So-
Process., vol. 13, no. 4, pp. 600–612, Apr. 2004. ciety, the IEEE Signal Processing Society, the IEEE Antennas and Propagation
[38] A. Gabiger-Rose, M. Kube, P. Schmitt, R. Weigel, and R. Rose, “Image Society, and the IEEE Communications Society. He served as a reviewer for the
denoising using bilateral filter with noise-adaptive parameter tuning,” in journal of Mathematical Problems in Engineering and the International Journal
Proc. IEEE IECON, 2011, pp. 4515–4520. of Electronics and Communications.