An FPGA-Based Fully Synchronized Design of A Bilateral Filter For Real-Time Image Denoising

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO.

8, AUGUST 2014 4093

An FPGA-Based Fully Synchronized Design of a


Bilateral Filter for Real-Time Image Denoising
Anna Gabiger-Rose, Student Member, IEEE, Matthias Kube, Robert Weigel, Fellow, IEEE, and
Richard Rose, Student Member, IEEE

Abstract—In this paper, a detailed description of a synchronous these parameters is very intuitive, which leverages the bilateral
field-programmable gate array implementation of a bilateral filter filter to an almost all-purpose solution in image processing.
for image processing is given. The bilateral filter is chosen for one The authors of [2] and [3] show that noise filtering, despite
unique reason: It reduces noise while preserving details. The de-
sign is described on register-transfer level. The distinctive feature the prevailing view, not always implies resolution reduction
of our design concept consists of changing the clock domain in but can even be used to sharpen the edges [2] or to enhance
a manner that kernel-based processing is possible, which means the flowlike structures [3]. In [4], the motion-adaptive bilateral
the processing of the entire filter window at one pixel clock cycle. filter is used for quality improvement in low bit rate video
This feature of the kernel-based design is supported by the ar- coding. Also, in [5], the bilateral filter is applied for noise
rangement of the input data into groups so that the internal clock
of the design is a multiple of the pixel clock given by a targeted reduction in a method for local tone mapping which maps high
system. Additionally, by the exploitation of the separability and dynamic range image to low dynamic range image.
the symmetry of one filter component, the complexity of the design Recently, bilateral filtering has gained a high awareness
is widely reduced. Combining these features, the bilateral filter is level in medical image processing and nondestructive testing.
implemented as a highly parallelized pipeline structure with very The authors of [6] studied the impact of noise reduction by
economical and effective utilization of dedicated resources. Due to
the modularity of the filter design, kernels of different sizes can be the bilateral filter applied to the reconstructed images. They
implemented with low effort using our design and given instruc- concluded that the images processed with this filter show a
tions for scaling. As the original form of the bilateral filter with significant improvement in image quality compared to their
no approximations or modifications is implemented, the resulting unfiltered counterparts. In [7], the authors discuss the results of
image quality depends on the chosen filter parameters only. Due noise reduction by the bilateral filter in projection space. This
to the quantization of the filter coefficients, only negligible quality
loss is introduced. means that the noise filtering takes place prior to computing the
reconstructed volume. It has been concluded that noise reduc-
Index Terms—Bilateral filter, field-programmable gate array tion of this kind can be translated into a dose reduction in X-ray
(FPGA), image processing, noise reduction, real-time processing.
computed tomography. Considering industrial applications, the
dose reduction permits the reduction of the scanning time and
I. I NTRODUCTION thus allows a higher throughput of test items.
Our own experiments and studies shown in [8] and [9]
B ILATERAL filtering has gained great popularity in image
processing due to its capability of reducing noise while
preserving the structural information of an image. The bilateral
confirm the possible dosis reduction. As the reduction of the
exposure time due to filtering is feasible, we are interested
filter [1] consists of two components. The detail-preserving in a real-time filtering of projections. Moreover, the filter is
property of the filter is mainly caused by the nonlinear filter not supposed to reduce the spatial resolution of projections to
component also called photometric filter. It selects the pixels of maintain the visibility of defects in a reconstruction. Since we
similar intensity which are averaged by the linear component achieve very satisfying results considering detail preservation
afterward. Very often, the linear component is formulated as with our field-programmable gate array (FPGA) implementa-
a low-pass filter. The amount of noise reduction via selective tion presented in [10], we intend to give a deeper insight in our
averaging and the amount of the blurring via low-pass filtering work.
are both adjusted by two parameters. The understanding of The major contribution of this paper is the detailed descrip-
tion of a novel FPGA design architecture of the bilateral filter
on register-transfer level (RTL). This abstraction level is chosen
for the possibility of direct specification of the clocking scheme
Manuscript received March 5, 2012; revised August 6, 2012 and October 24, [11]. The main advantages of this design are the capability of
2012; accepted December 6, 2012. Date of publication October 25, 2013; date
of current version February 7, 2014.
real-time processing and economical and effective utilization of
A. Gabiger-Rose, R. Weigel, and R. Rose are with the Institute for Elec- resources through the following.
tronics Engineering, Friedrich-Alexander University of Erlangen-Nuremberg,
91058 Erlangen, Germany (e-mail: anna.gabiger-rose@fau.de; robert.weigel@ 1) Sorting the data into equal groups to which separate
fau.de; richard.rose@fau.de). pipelines are assigned.
M. Kube is with the Department of Contactless Test and Measuring Systems, 2) Raising the internal clock frequency according to the data
Fraunhofer Institute for Integrated Circuits, 91058 Erlangen, Germany (e-mail:
matthias.kube@iis.fraunhofer.de). flow.
Digital Object Identifier 10.1109/TIE.2013.2284133 3) No external image buffer is necessary.
0278-0046 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
4094 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014

Moreover, due to the modularity of the design, it can be filter acceleration approach discussed so far is the high amount
extended to implement arbitrary kernel size with low effort. The of memory required for the implementation.
instructions required for this can be found later in this paper. Instead of a piecewise-linear approximation and subsam-
The remainder of this paper is organized as follows. In pling, the idea of utilizing a histogram-based approach for
Section II, we consider the related work. After a short descrip- accelerating the filter is presented in [18] and [19]. The main
tion of the bilateral filter in Section III, we give a detailed difference between these two works is that, in [18], a hierarchy
description of our FPGA design in Section IV. Section IV is of partial distributed histograms on multiple tiers is computed
the main part of this paper presenting the filter design stage and adjusted for each output pixel while the author of [19]
by stage. In Section V, the criteria applied to the evaluation of calculates the integral histogram of the image and extracts the
the image quality prior and after the noise filtering are detailed. histogram for each target filter window to obtain one output
After that, in Section VI, the results are discussed, and the pixel. These methods both are fast, but a real-time performance
performance potential of our filter design is analyzed. of the histogram-based approach in [19] can only be achieved
by very-large-scale-integration design of the filter shown in
[20]. The memory demand of the histogram-based acceleration
II. R ELATED W ORK
method is also high but is lower than that of the piecewise-linear
Since the bilateral filter is in widespread use, a lot of effort approximation and subsampling approach.
has been put into acceleration for use in practical applications. The aforementioned examples show that a filter modification
Mainly, among the publications concerning speeding up of technique reaches real-time performance only if its imple-
the bilateral filtering, two trends can be stated. One stream mentation utilizes hardware acceleration. Most of the referred
is focused on the modification of the filtering components, works rely on GPUs for acceleration. However, in fields of
resulting in an efficient algorithm. Another trend is to accelerate applications in which high power efficiency is crucial, an FPGA
the filtering through parallelizing the algorithm or through solution is preferable. In [21], an algorithm for the denoising of
hardware acceleration, including modifications of the filter at medical images is implemented on an FPGA and four different
the same time. GPUs. The authors show that the power consumption of their
In [12], a fast approximation of the original bilateral filter FPGA implementation is always significantly lower. Further-
is proposed. Here, the 2-D filtering is separated into two 1-D more, the authors of [21] point out that an FPGA implementa-
operations performing 1-D bilateral filtering in one arbitrary tion allows to count latency in image lines, resulting in delays
dimension and filtering the intermediate result in the same lower than one frame, while the latency on a GPU is always
manner in the subsequent dimension. The authors report that one frame. This is relevant for many medical applications which
the proportionality of the execution time to the number of demand fast image output to supply interactive operations.
filter dimensions decreases from exponential to linear. This The authors of [22] also choose an FPGA implementation
approach requires a little memory overhead but results in a for their image processing system because moving time-critical
filter which is fast enough to be used for preprocessing in video functionalities, like the edge detection in an image, to hardware
compression systems. However, as the photometric component platforms makes it possible to keep delays in the control loop
of the bilateral filter is not separable, the image resulting from to a minimum. The authors of [23] and [24] report excellent
the modified filter is documented to be slightly different from experience of using FPGAs for motion control of robots based
the image produced by the original filter. on real-time image processing. The main reason for using
Another acceleration approach proposed in [13] has given a FPGAs for real-time robotics tasks is the ability of FPGAs to
basis for numerous extensive works. This approach provides a satisfy the requirement for high computational power and data
numerical scheme for speeding up the filtering via a piecewise- throughput [24]. Moreover, FPGA solutions offer additional
linear approximation of the bilateral filter in the intensity do- advantages, such as reconfigurability and portability.
main and substituting the low-pass filtering by downsampling. However, considering complexity and timing constraints of
In [14], this technique is extended by transposing the computa- the algorithm to be implemented, the suitability of the chosen
tion to a 3-D space presenting the image intensity as a third hardware platform has to be checked [25]. A DSP implemen-
dimension over the 2-D image coordinate space. After that, tation has been regarded to be more appropriate for complex
the authors of [15] formulated the concept of the bilateral grid algorithms with high data dependence. For algorithms with
and implemented the bilateral filter using the proposed data low data dependence and high timing constraints, an FPGA
structure on three different graphics processing units (GPUs). solution is more suitable. The authors of [25] discuss in detail
Not until then, by means of their hardware acceleration, a the advantages of using FPGAs even if the algorithm shows
processing with 30 fps is possible which they assign as real- both high complexity and timing constraints. At the same time,
time performance. Later, the technique proposed in [13] was the authors of [26] emphasize in their conclusion that FPGA-
also implemented on a GPU by the authors of [16] and is based digital processing systems achieve better performance, at
also capable of the real-time processing with the same frame a lower cost, than traditional solutions based on DSPs.
rate. More recently, the lazy sliding window implementation Furthermore, the parallel architecture of the FPGA provides
of the approach in [13] was proposed in [17]. This method an excellent platform for the implementation of paralleled and
is suitable for single-instruction-multiple-data-type processors pipelined structures. This conclusion is made by many authors.
like DSPs. In this case, the speedup also allows applications Therefore, implementing an algorithm for color image segmen-
requiring real-time performance. The main drawback of the tation for object detection in full parallelism on an FPGA, the
GABIGER-ROSE et al.: FPGA-BASED FULLY SYNCHRONIZED DESIGN OF BILATERAL FILTER 4095

authors of [27] report a drastic improvement of the speed of averaging of similar pixel values only, regardless of their po-
segmentation compared with the sequential-code-based seg- sition in the filter window. If the value of a pixel in the filter
mentation. In [28], a design of a fully pipelined data path for window diverges from the value of the pixel being filtered by a
real-time face detection using FPGA is described which sup- certain amount, the pixel is skipped.
ports high-speed detection irrespective of the number of faces Taking Gaussian noise into account, the shift-variant filtering
in an image. The authors of [29] implement their paralleled and operation of the bilateral filter is given by
fully pipelined hardware for real-time electromagnetic transient
1 
simulation on an FPGA and thereby solve a challenging prob- φ̄(m̄0 ) = φ(m) · s (φ(m0 ), φ(m)) · c(m0 , m).
k(m0 )
lem of implementation of the complex simulation models. m∈F
There are several publications dealing with FPGA implemen- (1)
tations of the bilateral filter. In [30], one of these designs is The term m = (m, n) denotes the pixel coordinates in the
presented. The verilog hardware description language (VHDL) image to be filtered and m0 = (m0 , n0 ) and m̄0 = (m̄0 , n̄0 )
code of this design is generated automatically from the mod- represent the coordinates of the centered pixel in the noisy and
els for FPGA synthesis using System Generator from Xilinx. in the filtered images, respectively. With these notations, φ̄(m̄0 )
Although the optimization setting for the code generation was means the gray value of the pixel being filtered, and φ(m)
for maximum clock frequency, the authors admit that the speed identifies the gray value of the spatially neighboring pixels to
of their implementation for a 15 × 15 pixel filter kernel is φ(m0 ) in the filter window F .
insufficient for a real-time application. The authors of [31] The following expressions (2) and (3) describe the photo-
compared a VHDL and a high-level synthesis (HLS) descrip- metric and the geometric components s(φ(m0 ), φ(m)) and
tion, created by System Generator, of an adaptive impulse noise c(m0 , m), respectively:
filter and concluded that higher speed of the system clock can   2 
be achieved using VHDL description. Thus, these publications 1 φ(m0 ) − φ(m)
s (φ(m0 ), φ(m)) = exp − (2)
show exemplarily that the handcrafted optimization of an FPGA 2 σph
design regarding both the operating frequency and the resource   2 
utilization is still irreplaceable. 1 m0 − m
A different approach for the FPGA implementation of a real- c(m0 , m) = exp − (3)
2 σc
time bilateral filter has been proposed in [32]. The modified
filter is based on the calculation of the filter coefficients from where parameters σph and σc regulate the width of the Gaussian
the photometric filter only. The spatial filtering is eliminated curve assigned to s(φ(m0 ), φ(m)) and c(m0 , m), respectively.
due to the processing of the minimal window of 3 × 3 and The photometric component compares the gray value of the
raising of the derived photometric coefficients to the power of centered pixel with the gray values of the spatial neighborhood
8. According to the authors, for a moderate noise level, their and computes the corresponding weight coefficients depending
modified bilateral filter can achieve slightly better results com- on the factor σph . The more the absolute difference of the
pared to the traditional bilateral filter shown in [1]. However, gray values exceeds σph , the lower is the corresponding filter
the original bilateral filter can be tuned by two parameters coefficient and vice versa. The domain filter c(m0 , m) acts as
which are highly responsible for the filtering performance. a standard low-pass filter, the weights of which are reciprocally
Unfortunately, no description of the parameters used for this proportional to the spatial distance of the centered pixel to the
comparison is given in [32]. pixels in the neighborhood.
The work published in [33] is most related to our work. Normalization with
The major parallel to our design consists in implementing the 
bilateral filter on an FPGA without any modification. This k(m0 ) = s (φ(m0 ), φ(m)) · c(m0 , m) (4)
approach is sometimes called brute-force method. However, the m∈F
main difference to our work is that the authors developed their
guarantees that the range of the filtered images does not change
design using an HLS tool. The resulting architecture presents a
significantly due to the filtering. Owing to the fact that the
3 × 3 filter kernel. In contrast, our design is based on an RTL
coefficients of the photometric component cannot be computed
description and presents a 5 × 5 filter kernel. Our design allows
in advance, the division by the normalization factor cannot be
high clock frequency and high data throughput and shows only
avoided by means of prescaling of the filter coefficients.
a slight increase of resource demand considering the larger
kernel. From this follows that our architecture utilizes hardware
resources more efficiently and more economically. IV. D ESIGN C ONCEPT
The image data, as well as all constants and coefficients
used in the following design concept, are integer numbers. As
III. B ILATERAL F ILTER
discussed in Section VI, there is no need to implement floating-
The bilateral filter [1] embodies the idea of a combination point computation. With the aid of the presented design con-
of domain and range filtering. The domain filter averages the cept, the bilateral filter can be realized as a highly parallelized
nearby pixel values and acts thereby as a low-pass filter. The pipeline structure giving great importance to the effective re-
range filter stands for the nonlinear component and plays an source utilization. In this paper, the data paths are detailed. The
important part in edge preserving. This component allows description of the control signals is not addressed here.
4096 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014

Fig. 1. Order of the functional units of the bilateral filter.

Fig. 2. Principle of the input data retrieval for the image filtering.

For the design description, a window size of 5 × 5 is chosen.


This window size is the tradeoff between high noise reduction
and low blurring effect.
The design concept for the implementation of the bilateral
filter is subdivided into three functional blocks. The block-
based design approach reduces design complexity and simpli-
fies validation [34]. Fig. 1 presents these units and their order
in the concept. The input data marked by “Data_in” are read
line by line and arranged for further processing in the register
matrix. The second unit is the photometric filter which weights
the input data according to the intensity of the processed pixels.
The filtering is completed by the geometric filter, and the
filtered data are marked by “Data_out.”
Fig. 3. Register matrix of the kernel-based design concept.
A. Register Matrix
“line storage n-2” moves out of the register matrix. As the
The photometric filter component, also often referred to as input data are read into the register matrix pixel by pixel, the
a range filter in the related literature, is a nonlinear filter. It content of the line storages and of the filter kernel is shifted
means that the filter coefficients change for every filter position. by one pixel at each clock event. This shift emulates the shift
Thus, the pixel weights for the photometric component have of the filter kernel. Acting this way, at the end of an image
to be calculated separately for every pixel in the filter window. line, all remaining rows are shifted one row down. The former
The number of weights depends on the filter window size. Here, succeeding row “line storage n + 1” can now be processed. The
24 weights have to be computed for the filtering of one image output lines form the output image which is stored externally.
pixel. The parallel calculation of 24 weights in the photometric
The filter window is shifted first along the input lines rep- filter component and the subsequent weighting in the geometric
resenting the image rows, moving one row down every time component combined with the final normalization at the filter
the precedent row has been filtered. Consequently, the demand output require a large amount of resources considering the
arising from this filtering technique is that at least five lines sparse time of just one pixel cycle. Due to the flexibility of the
have to be stored for the period of time during which a line clock management in FPGAs, this challenge can be accepted.
is filtered. As an external image buffer is undesired because The solution is offered by our kernel-based design concept in
of the additional expenses of resources due to the memory Fig. 3. The single registers are interconnected in a manner that,
controller and because of the additional latency due to the aside from the shift of the filter window by one pixel, the entire
memory accesses, the five input lines are stored in the line kernel is provided to the next filter stage simultaneously. This
storages which are implemented as block RAMs for data with is an important advantage of the presented kernel-based design
N bits. The five input lines are called image rows or rows in the concept as no extra data buffer is required. On the other hand, it
following. These five rows include the row to be filtered, two is necessary to process all 25 pixels in one pixel cycle in order
foregoing rows, and two succeeding rows. to keep up with the reading of the input lines into the register
This arrangement is depicted in Fig. 2. The pixel being fil- matrix.
tered is marked by “mid_pix.” This pixel and its neighborhood The output of the register matrix is sorted into groups, in this
in the solid box represent the kernel of the bilateral filter. case into six groups, and fed into the photometric filter compo-
After the middle row has been filtered, the outer foregoing row nent with the quadruple pixel clock frequency synchronously.
GABIGER-ROSE et al.: FPGA-BASED FULLY SYNCHRONIZED DESIGN OF BILATERAL FILTER 4097

Fig. 4. Abstract illustration of the photometric filter component.

The number of the groups is explained by the symmetry of


the geometric filter component which is discussed later in
Section IV-C. The sorting is done by means of multiplexing the
pixels in the manner shown in Fig. 3. The quadruplication of
the filter processing clock is implemented by setting the select
signal of the multiplexers four times in one pixel clock. Here,
the clock domain changes to the fourfold of the input pixel
clock. The counter on the top of Fig. 3 generates the select
signal and thus controls the readout of the register matrix. This
counter is clocked with the quadruple pixel clock as well. The Fig. 5. Photometric filter component.
counter is first enabled after the whole register matrix is filled.
The pixels in each group are processed in parallel while each
group is pipelined through to the register matrix output stage.
The pixel in the center of the filter window is not a part of any
group and is forwarded to a latch belonging to the input stage
of the photometric filter component. The sorting of the pixels
into groups and the quadruplication of the pixel clock are the
key to the presented synchronous FPGA design concept using
a parallelized pipeline architecture.

B. Photometric Component
After the register matrix has been filled, the grouped image
data are provided to the photometric filter component which
is pictured in Fig. 4. At the output of the photometric filter, the
weighted pixels appear, still sorted into groups, accompanied by
the “weighted mid_pix.” Additionally, the photometric coeffi-
cients have to be forwarded for the required normalization at the
last stage of the filtering according to (4). Thus, in parallel to the
pixels, the photometric coefficients also have to be processed by
the geometric filter in order to obtain the normalization factor
defined in (4). For this reason, the output of the photometric
filter consists of the following:
1) weighted pixels sorted into groups 0 . . . 5;
2) the weighted pixel being filtered, marked by “mid_pix”;
3) photometric coefficients corresponding to groups 0 . . . 5.
In further stages of the design, the weighted pixel values, i.e., Fig. 6. Processing order of input data in the photometric filter component.
the outputs of the multipliers, are named by their groups 0 . . . 5.
A detailed functional flow block diagram of the photometric The way of arranging and the processing order of the input
filter is shown in Fig. 5. The pixel in the center of the filter data of the photometric component are shown in Fig. 6. At the
window has to be available during the calculation of the re- first internal clock event t0 , the first pixels of each group are
quired 24 pixel weights. Latching the centered pixel allows the provided to the respective pipeline. At the second internal clock
computation of the gray value differences between the centered t1 , the second pixels of each group enter the component. This
pixel and the remaining pixels inside of the filter window. Each organization of groups allows the processing of the whole filter
group contains four pixels. A separate pipeline belonging to window in four internal clock cycles corresponding to one pixel
each group makes it possible to process the entire neighborhood cycle. In the upper part of Fig. 5, the processing path for the
of “mid_pix” at one pixel clock signal. All six pipelines are group 0 is shown; in the lower part, there is the processing path
designed identically. for the group 5.
4098 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014

Fig. 8. Abstract illustration of the geometric filter component.

If N is greater than P, via logical disjunction of left (N-P) bits,


it is checked whether the gray value difference is greater than
the chosen limit 2P − 1. The result of the disjunction selects the
Fig. 7. Limitation of the number of coefficients. coefficient address. If the gray value difference is greater than
the limit, the weight coefficient is set to zero which is stored
at the address 2P − 1. In the opposite case, the corresponding
The combinatory blocks “comb.0 . . . 5” compute the abso- coefficient is read out of the LUT. This coefficient may also
lute gray value difference required by (2). In order to keep be zero as the number of coefficients is extended to 2P − 1.
the design synchronous, the gray values of each pipeline are During the readout of the coefficient, the related gray value is
registered during the difference calculation. The upper path in registered for synchronicity. At the next internal clock event, the
Fig. 5 shows the required registers labeled “group 0” to make gray values of each group are multiplied by the corresponding
sure that the gray value appears at the input of the multiplier coefficients while registering the coefficients in “coeff. group
at the same time as the corresponding photometric coefficient. 0 . . . 5” for the final normalization.
Through the following, we use registers to keep our design The pixel in the center of the filter window does not belong to
synchronous. Thus, it makes any delay control inside of our any group and is processed separately. This pixel is multiplied
architecture redundant. by the highest coefficient 2W − 1 and delayed by registers
To avoid the calculation of the expensive exponential, all “photo_k middle” and “geom_in middle” for synchronicity.
possible values of the function (2) are precalculated and stored
in the lookup table (LUT). The absolute difference of the
gray values itself is directly interpreted as the address of the C. Geometric Component
corresponding weight coefficient in the LUT. For the design of the geometric filter component, advantage
Due to the quantization, the number of the weight coeffi- is taken of its separability and its symmetry. Because of the
cients is limited. This limit depends on three parameters: separability, the geometric filter is split into the vertical and hor-
1) the word length N of the input data; izontal parts. Therefore, 2-D filtering is replaced by successive
2) the parameter σph ; 1-D filtering in vertical and horizontal directions. This solution
3) the word length W of the coefficients. is preferred in the design of the geometric filter because 1-D
The first point means that increasing the color depth of an filtering can be implemented more efficiently. Both parts are
image causes a larger amount of intensity differences that implemented twice to filter the weighted image data and the
have to be stored in the LUT. Depending on the parameter photometric weights simultaneously which is shown in Fig. 8.
σph , the slope of the Gaussian curve is steeper or more flat The input of the vertical component parts is the 2-D array
which influences the number of coefficients different from zero of the filter window and the 2-D array of the corresponding
after the quantization. It depends on the word length W itself coefficients. Each output is a 1-D vector in which each entry
whose coefficients actually are different from zero after the represents one filtered and cumulated column. The coefficients
quantization. of the geometric component are labeled “C_0, C_1, C_2.” The
In Fig. 7, the coefficients are plotted for N = 8 b, W = 8 b, output of the geometric filter consists of the filtered unnor-
and σph = 60. As the negative exponential converges toward malized gray value (kernel result) and the normalization factor
zero for increasing gray value differences, there are only a (norm result).
limited number of quantized coefficients that are different from Due to the symmetry of the weight coefficients of the geo-
zero. Considering the example in Fig. 7, there are only 188 metric component, the order of multiplication and addition is
coefficients to be stored. For simplification of the internal swapped in both filter parts. This fact plays an important role
control, the number of coefficients is extended to the next in pixel group formation. At first, the weighted gray values
power of 2, resulting in the highest address 2P − 1. In the which are located at the same distance from the centered pixel
example, the highest address is 255. The coefficients are stored in the filter window are summed up [35]. Because of the equal
in the LUT of each pipeline in the initialization phase of the distance, these gray values should be weighted with the same
filtering. coefficient anyway. For a 5 × 5 window, there are always 4
GABIGER-ROSE et al.: FPGA-BASED FULLY SYNCHRONIZED DESIGN OF BILATERAL FILTER 4099

Fig. 9. Vertical part of the geometric filter component. Fig. 10. Horizontal part of the geometric filter component.

or 8 pixels at the same distance from the centered pixel. For of the geometric coefficients, it is assured that the accumulation
the simplicity of the design, it makes sense to assemble the does not result in a carry. The registers “REGcol 0,1,2” in this
pixels into equally large groups. Smaller groups allow for better part of the design are used to delay weighted data to maintain
handling of the design. For this reason, the pixels are divided synchronicity. After the multiplication, the weighted values are
into groups of four with regard to the subsequent processing summed up by the adder tree to one value at each internal clock
explained in the following sections. After the accumulation of event.
the pixels according to their symmetry, the sum is multiplied The processing of the centered column is detailed in the
by the corresponding coefficient. The horizontal processing is lower part of Fig. 9. The centered pixel is weighted and delayed
done in the same way. by “REGcen” so that this pixel and the remaining pixels in the
The coefficients for the geometric component are scaled in centered column can be fed to the input of the adder tree simul-
such a manner that the sum of the vertical coefficients (and taneously. The remaining pixels enter the dedicated processing
the horizontal ones, respectively) is equivalent to the so-called path one by one. They were multiplexed in the register matrix
normalized one [35]. For the signed coefficients with the word in the way that they can be combined pairwise and multiplied
length W, the normalized one is equal to 2W −1 . This means by the same coefficient in the geometric component. In order
that the division of the weighted gray values and photometric to weight the pixels in a proper way, every incoming pixel is
coefficients after geometric filtering can be realized as a simple stored in the register “REGcol mid” so that the subsequently
shift operation. In the last stage, the normalized filtered gray calculated sum is valid every second internal clock event. The
value has to be divided by the normalized product of the photo- multiplexing of the filter coefficients with zeros assures that
metric coefficients. The geometric coefficients are calculated in invalid sums vanish due to the multiplying by zero and do not
advance and stored in a block RAM. falsify the result.
1) Vertical Component Part: The first stage of the geometric As it is shown in Fig. 8, the vertical part of the geometric fil-
component is the vertical part which is pictured in Fig. 9. With ter for the weighting of the photometric coefficients is designed
the aid of Fig. 6, it can be seen that the pixels of the first column identically.
numbered 1, 2, 3, 4, 5 and the first pixel of the middle column 2) Horizontal Component Part: In Fig. 10, the horizontal
numbered 11 enter the vertical component part simultaneously. part of the geometric component is displayed. After processing
For the corresponding photometric coefficients, the same order in the vertical dimension, the filter window is reduced to one
of processing is valid. row, and its elements are computed at one internal clock event
The groups 0, 1, 2, 3, 4, which means all columns with the each. In order to be able to reuse the symmetrical design, the
exception of the centered column, are processed as shown in values of the filtered columns 0, 1, 3, 4 are stored in the shift
the upper part of Fig. 9. The geometrically symmetrical pixels registers according to the order of their reception. The filtered
are cumulated at first and then multiplied by the geometric photometrical coefficients are stored in the same way. Since the
weight coefficient. All coefficients for the geometric filter are content of the shift register in the left part of Fig. 10 is valid
constant for the chosen filter window size. Due to the scaling at every fourth internal clock event, the time domain changes
4100 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014

column represents the last pixel group in which particular


attention has to be paid to the arrangement of the pixels in order
to keep the weighting in the geometric component valid.
Furthermore, the number of pipelines, including combinatory
blocks and coefficient LUTs in the photometric component,
equals Ngroups . The design of the pipelines remains the same.
The number of the pipelines in the vertical part of the geo-
Fig. 11. Final normalization of the filtered data. metric component changes according to the kernel size. For
the structure in the upper part of Fig. 9, (K + 1)/2 pipelines
here to the domain of the pixel clock. This domain change is are required because the geometrical symmetry of the pixels
indicated by the dashed line in Fig. 10. All operations on the has to be taken into account. The lower part of the verti-
right-hand side of the dashed line are executed according to the cal geometric component remains unchanged except for the
pixel clock. multiplexer which has ngroup_member inputs according to the
At every pixel clock signal, the valid column values are writ- required filter window size. The shift register of the horizontal
ten to the registers which perform the division of the weighted part of the geometric component has to be dimensioned for
gray values by the normalized ones. The division is imple- (K − 1) values. The number of the connected pipelines has
mented through a shift operation. The remaining processing is to be adjusted to the length of the shift register, taking the
similar to the processing described in the previous paragraph. geometrical symmetry into account again. The processing of
The geometrically symmetrical pixels are cumulated at first and the centered column remains unchanged. The same holds for
multiplied afterward by the geometric weight coefficient. For the normalization coefficients as well.
the geometric filtering in the horizontal direction, the same geo- Finally, if the maximal operating frequency foperating is
metric coefficients are used as for the vertical filtering. The final known, the internal clock frequency finternal can be determined
division by the normalized one is performed in the next stage. as follows:

foperating
D. Normalization finternal = . (7)
ngroup_member
At the final stage, the kernel result has to be normalized by
the norm result as shown in Fig. 11. After the final accumulation According to the internal clock frequency finternal , the counter
of these values, they are both divided by the normalized one has to be adjusted, which generates the “select” signal for the
again. In this manner, the word lengths of the weighted gray multiplexers and the enable signal “EnREG” for the horizontal
values and of the norm are both (W − 1) bits shorter. Finally, part of the geometric component.
after the division, N bits of the final result are forwarded to the
output of the bilateral filter.
V. I MAGE Q UALITY A SSESSMENT

E. Design Scalability To evaluate the performance of the noise reduction and the
accuracy of the detail preservation, criteria for the image quality
In previous paragraphs, we detailed the filter design for the assessment are required. The criteria chosen in this work are
5 × 5 kernel. However, depending on an application, another PSNRdB and MSSIM.
kernel size might be required. For small images, a 3 × 3
window size is more suitable to prevent blurring. Some authors 1) PSNRdB : The well-known peak-signal-to-noise ratio
choose to work with a larger kernel of the size of 11 × PSNRdB in decibels is defined as follows:
11 pixels [36]. Our design can be scaled for different kernel  
GVmax
sizes. Starting at the register matrix, it has to be dimensioned PSNRdB = 20 · log10 √ (8)
according to the required kernel size. The kernel size in one MSE
dimension is assigned with K in the following: 1  2
MSE = φref (m) − φ̃(m) (9)
MN
M N
Ngroups = K + 1 (5)

where Ngroups means the number of the pixel groups. The where MSE denotes the mean squared error between the
quantity of the line storages equals K. The number of required image to be compared and the reference image. GVmax
multiplexers equals Ngroups . The multiplexing pattern of the represents the maximum gray value depending on the
pixels remains unchanged for every kernel size. According to word length after the digitalization of the images. The
the symmetry of the kernel, the pixels have to be grouped into noiseless M × N image with gray values φref (m) pro-
Ngroups containing ngroup_member pixels each vides the reference for the measurement of the MSE.
The gray values φ̃(m) originate from the image to be
ngroup_member = K − 1. (6) compared. Considering the quality of the noise filter,
PSNRdB describes the capability of the filter to suppress
The groups are always built up in the manner that each row noise regardless of the perceived visual quality of the
except for the middle pixel forms a pixel group. The middle filtered image.
GABIGER-ROSE et al.: FPGA-BASED FULLY SYNCHRONIZED DESIGN OF BILATERAL FILTER 4101

2) MSSIM: The mean structural similarity index MSSIM is


a method for the assessment of the image quality that
takes advantage of the characteristics of the human visual
system [37]. First, the local structural similarity SSIM of
the 11 × 11 image blocks v(φref ) and v(φ̃) is calculated

SSIM v(φref ), v(φ̃) = l v(φref ), v(φ̃) ·

· c v(φref ), v(φ̃) · s v(φref ), v(φ̃) (10)

where l(v(φref ), v(φ̃)) is the luminance comparison


function, c(v(φref ), v(φ̃)) compares the contrast of
the image blocks after luminance subtraction, and
s(v(φref ), v(φ̃)) conducts the structure comparison after
contrast normalization. After averaging the SSIM of J
blocks over the whole image, the mean value MSSIM

1
J
MSSIM(Φref , Φ̃) = SSIM vj (φref ), v(φ̃) (11)
J j=1

of an entire image represented by Φ̃ is identified. The


value MSSIM = 1 means that two images are completely
identical. The smaller the MSSIM, the less the structural
similarity that the two images show. The detailed descrip-
tion of MSSIM can be found in [37].

VI. R ESULTS
After an implementation in Matlab, the proposed architecture
of the bilateral filter was implemented in VHDL and simulated
with ModelSim. A test image was filtered by Matlab imple-
mentation as well as the ModelSim simulation, and the filtered
images were compared. The purpose of this comparison is to
analyze the image quality drop due to the quantization of the
filter coefficients in our FPGA design.
The test image Lighthouse shown in Fig. 12(a) is an 8-b
grayscale image with a size of 512 × 512 pixels. Hence, in the
following, GVmax = 255 is used.
In order to apply the bilateral filter to a color image, the
color data have to be transformed into the CIELab color space
[1]. The structure of the filter remains unchanged. However,
processing of color images is beyond our research interest, so
no results on this topic will be reported.

A. Performance Analysis
For the comparison of the filtering capability between
the Matlab implementation and the ModelSim simulation,
Gaussian noise with standard deviation σnoise = [10, 20, 30, 40,
50, 60] was added to the test image.
In Fig. 12, the test image is contrasted with its noisy coun-
terpart with σnoise = 20 and two filtered images. The filter
parameters σph = 3 · σnoise and σc = 1 were chosen for the
photometric and geometric components, respectively. For filter-
ing in Matlab, no quantization of the filter coefficients was ap-
plied. The corresponding filtered image is shown in Fig. 12(c).
For the simulation with ModelSim, the coefficient word length Fig. 12. (a) Original image. (b) Noisy image with σnoise = 20. (c) Filtering
W = 8 was used. The simulation result is shown in Fig. 12(d). in Matlab. (d) Filtering in ModelSim.
4102 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014

TABLE I
F ILTERING R ESULTS

TABLE II
S YNTHESIS R ESULT

The numbers in Table I show that applying the presented


filter architecture delivers results almost as good as that of
the Matlab implementation. The slight decrease of the image
quality due to filtering by ModelSim simulation is explained by
coefficient quantization and by rounding of the internal values
during the shift operations. No artifacts caused by quantization
are introduced into the filtered image. In summary, the simula-
tion results are highly satisfying.

B. Verification
For verification, a Virtex-5 FPGA platform equipped with a
Virtex XC5VLX50-1 device was used. The shortened synthesis
report of the filter design is shown in Table II. A long-term
Fig. 13. Performance comparison of the Matlab implementation and the
ModelSim simulation. trial proved that the design is suitable for real-time processing.
The FPGA board was connected to a camera with a 12-b
resolution depth, generating 30 fps at a full resolution of 1024 ×
Between the Matlab implementation and the ModelSim simu-
1024 pixels.
lation, no visually distinguishable difference can be registered.
Due to the technical specification of the camera, pauses be-
The results of the quantitative comparison between the Mat-
tween the frames are necessary so that 30 fps is the maximally
lab implementation and the ModelSim simulation are con-
achievable frame rate. Thus, the maximal data flow reaches
trasted in Fig. 13 and summarized in Table I. As our recent
approximately 31.5 Mpixel/s. Consequently, we restricted the
research shows, by adjusting σph as a multiple of the measured
clock frequency of our design to 40 MHz in this application.
standard deviation of noise rather than by a single constant,
The internal clock frequency is 160 MHz. With this clock rate,
even better PSNRdB can be achieved. Thus, an optimal setting
a maximal throughput of 38 fps is possible.
for the filter can be chosen which reduces noise and prevents
With a different camera, an even higher frame rate is achiev-
blurring at the same time as far as possible. Exceeding this point
able. Using our FPGA platform, the maximal possible internal
causes oversmoothing, and choosing the adjusting parameter
frequency shown in Table II is 220 MHz. Hence, the maximal
below this point leads to insufficient noise suppression. The
operating frequency of our filter design with the contemplated
discussion of this topic is important but beyond the scope of
FPGA Virtex-5 equals 55 MHz. Considering the image reso-
this paper. For more details, refer to [38].
lution of 1024 × 1024 pixels, the following frame rate can be
Fig. 13 reveals that, for increasing noise levels, PSNRdB and
computed:
MSSIM both increase after noise filtering. For higher standard
deviation of noise, the gain is higher. Using our setting σph = −1
pixels 18.18 ns frames
3 · σnoise , averaging with higher weights is performed for in- (1024 × 1024) · = 52.45 . (12)
frame pixel second
creasing noise levels. Owing to this fact, PSNRdB rises by a
higher amount. MSSIM also increases because the geometrical This calculation is valid only for a throughput of 1 pixel/cycle
component remains narrow, preventing oversmoothing. which is given by our design.
GABIGER-ROSE et al.: FPGA-BASED FULLY SYNCHRONIZED DESIGN OF BILATERAL FILTER 4103

TABLE III VII. C ONCLUSION


C ITED FPGA I MPLEMENTATIONS OF THE B ILATERAL F ILTER
In this paper, we have given a detailed description of an
FPGA design of the bilateral filter for real-time image pro-
cessing. The advantages of our design can be summarized in
following points.
1) The filter design for a kernel size of 5 × 5 shown here
utilizes the FPGA resources economically, which makes
it feasible to implement the filter on a common medium-
sized FPGA.
2) The introduced register matrix at the first stage of the
filter makes external image storage redundant, contribut-
ing to the decrease of the resource demand of the filter
implementation.
3) The shown architecture is synchronous and capable of
real-time processing supporting high clock frequencies.
The total delay of the output pixels of our architecture with Maximal operating frequency depends on the chosen
a kernel size of 5 × 5 pixels applied to an image of 512 × FPGA family.
4) Conceiving our filter architecture, we kept in mind the
512 pixels is 2560 + 36 cycles. The time required for filling
up of the register matrix, depending on the kernel size and scalability of the design in order to enable the implemen-
tation of arbitrary filter window size with low effort.
image width, results in a delay of 5 × 512 = 2560 cycles. The
5) The shown filter architecture assures a constant process-
processing time from the multiplexers in the register matrix to
the output of the normalization stage is constant and depends ing delay independent of the filter window size. The total
delay is the sum of the processing delay and the fill-up
not on the kernel size. The critical operations are performed
time of the line storages which depends on the kernel size
at internal clock frequency. If the kernel size is changed, the
pixel groups have to be reordered, and the internal clock has to and image width.
6) Image quality assessment in terms of PSNRdB and struc-
be adjusted according to (7). In this case, the processing time
still accounts for 36 cycles. The normalization by division costs tural similarity assured that the image quality loss due
24 cycles, which makes out 66% of the whole processing time. to coefficient quantization and due to rounding of the
internal results is negligible.
For the evaluation of the performance of the filter design,
a comparison with other implementations from the references
is given in Table III. Except for the authors of [32], all other R EFERENCES
authors implement the original bilateral filter from [1]. From [1] C. Tomasi and P. Manduchi, “Bilateral filtering for gray and color im-
ages,” in Proc. IEEE ICCV, 1998, pp. 839–846.
[32], the full parallel architecture is used for the comparison [2] B. Zhang and J. P. Allebach, “Adaptive bilateral filter for sharpness en-
in Table III. All filters are implemented on different FPGAs of hancement and noise removal,” IEEE Trans. Image Process., vol. 17,
different families and generations, which makes the comparison no. 5, pp. 664–678, May 2008.
[3] B. Yan and A.-D. Saleh, “Structure enhancing bilateral filtering of
less significant, but still, itemizing some features like the max- images,” in Proc. IEEE PCSPA, 2010, pp. 614–617.
imum clock frequency of the design or the resource demand [4] M. de-Frutos-López, H. Medina-Chanca, S. Sanz-Rodríguez, C. Peláez-
might give a good insight. Moreno, and F. Díaz-de-María, “Perceptually-aware bilateral filter for
quality improvement in low bit rate video coding,” in Proc. IEEE PCS,
Our design works at the highest clock frequency. However, 2012, pp. 477–480.
considering the kernel size of 5 × 5 pixels and the switching [5] J. Won Lee, R.-H. Park, and S. Chang, “Noise reduction and adaptive
of the time domain, our architecture presents only the third contrast enhancement for local tone mapping,” IEEE Trans. Consum.
Electron., vol. 58, no. 2, pp. 578–586, May 2012.
highest frame rate. However, it looks different if we implement [6] J. Giraldo, Z. Kelm, L. Yu, J. Fletcher, B. Erickson, and C. McCollough,
a 3 × 3 filter kernel. In this case, the operating frequency is “Comparative study of two image space noise reduction methods for com-
110 MHz, and the resulting frame rate doubles, which puts the puted tomography: Bilateral filter and nonlocal means,” in Proc. Conf.
IEEE EMBS, 2009, pp. 3529–3532.
performance of our design on the second place. [7] L. Yu, A. Manduca, J. Trzasko, N. Khaylova, J. Kofler, C. McCollough,
Regarding the resource demand, it should be clear that the and J. Fletcher, “Sinogram smoothing with bilateral filtering for low-
dose CT,” in Proc. SPIE Med. Imag.: Phys. Med. Imag., 2008, vol. 6913,
logic elements of Altera and the logic slices of Xilinx are pp. 691329-1–691329-8.
built differently. The values in Table III give merely a hint at [8] A. Gabiger, R. Weigel, S. Oeckl, and P. Schmitt, “Enhancement of CT
the FPGA area used by each design. On the other hand, the image quality via bilateral filtering of projections,” in Proc. 1st Int. Conf.
Image Formation X-ray Comput. Tomography, 2010, pp. 140–143.
number of required multipliers can be compared directly. In [9] A. Gabiger-Rose, R. Rose, M. Kube, P. Schmitt, and R. Weigel, “Noise
[30], the number of the multipliers is not available. According adaptive bilateral filtering of projections for computed tomography,” in
to the statement of the authors of [33], an efficient parallel Proc. 11th Int. Meet. Fully Three-Dimens. Image Reconstruction Radiol.
Nucl. Med., 2011, pp. 306–309.
implementation of a bilateral filter for a 5 × 5 mask requires 25 [10] A. Gabiger, M. Kube, and R. Weigel, “A synchronous FPGA design of
multipliers.We have shown that our design concept is efficient a bilateral filter for image processing,” in Proc. IEEE IECON, 2009,
and it requires only 23 multipliers. Therefore, considering the pp. 1990–1995.
[11] T. Riesgo, Y. Torroja, and E. de la Torre, “Design methodologies based
implemented window size of 5 × 5 pixels, we use the resources on hardware description languages,” IEEE Trans. Ind. Electron., vol. 46,
more economically. no. 1, pp. 3–12, Feb. 1999.
4104 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 8, AUGUST 2014

[12] T. Q. Pham and L. J. van Vliet, “Separable bilateral filtering for fast video Anna Gabiger-Rose (S’09) was born in
preprocessing,” in Proc. IEEE ICME, 2005, pp. 1–4. Ordshonikidse, Ukraine, in 1978. She received the
[13] F. Durand and J. Dorsey, “Fast bilateral filtering for the display of high- Dipl.-Ing. degree in electrical engineering, electro-
dynamic-range images,” ACM Trans. Graph., vol. 21, no. 3, pp. 257–266, nics, and information technology from the Friedrich-
Jul. 2002. Alexander University of Erlangen-Nuremberg,
[14] S. Paris and F. Durand, “A fast approximation of the bilateral filter using Erlangen, Germany, in 2007.
a signal processing approach,” in Proc. ECCV, 2006, pp. 568–580. From 2001 to 2007, she was a Student Assistant
[15] J. Chen, S. Paris, and F. Durand, “Real-time edge-aware image processing with the Department of Contactless Test and Mea-
with the bilateral grid,” ACM Trans. Graph., vol. 26, no. 3, pp. 1–9, suring Systems, Fraunhofer Institute for Integrated
Jul. 2007. Circuits, Erlangen. She is currently a Research As-
[16] Q. Yang, K.-H. Tan, and N. Ahuja, “Real-time O(1) bilateral filtering,” in sistant with the Institute for Electronics Engineering,
Proc. IEEE CVPR, 2009, pp. 557–564. University of Erlangen-Nuremberg. Her research interests include the design of
[17] M. M. Bronstein, “Lazy sliding window implementation of the bilateral embedded systems for image processing and the investigation of digital filtering
filter on parallel architectures,” IEEE Trans. Image Process., vol. 20, no. 6, techniques for image quality enhancement.
pp. 1751–1756, Jun. 2011. Mrs. Gabiger-Rose is member of the IEEE Industrial Electronics Society.
[18] B. Weiss, “Fast median and bilateral filtering,” ACM Trans. Graph., She served as a reviewer for the 35th Annual Conference of the IEEE Industrial
vol. 25, no. 3, pp. 519–526, Jul. 2006. Electronics Society (IECON09).
[19] F. Porikli, “Constant time O(1) bilateral filtering,” in Proc. IEEE CVPR,
2008, pp. 1–8.
[20] Y.-C. Tseng, P.-H. Hsu, and T.-S. Chang, “A 124 Mpixels/sec VLSI de-
sign for histogram-based joint bilateral filtering,” in IEEE Trans. Image Matthias Kube was born in Mainz, Germany, in
Process., Nov. 2011, vol. 20, no. 11, pp. 3231–3241. 1975. He received the Dipl.-Ing. FH (M.Sc.) degree
[21] F. Hannig, M. Schmid, J. Teich, and H. Hornegger, “A deeply pipelined in electrical engineering and microelectronics from
and parallel architecture for denoising medical images,” in Proc. IEEE the Georg-Simon-Ohm University of Applied Sci-
FPT, 2010, pp. 485–490. ence of Nuremberg, Nuremberg, Germany, in 2002.
[22] L. Costas, P. Colodrón, J. J. Rodríguez-Andina, J. Fariña, and Since 2003, he has been working as a member of
M.-Y. Chow, “Analysis of two FPGA design methodologies applied to the research staff at the Department of Contactless
an image processing system,” in Proc. IEEE ISIE, 2010, pp. 3040–3044. Test and Measuring Systems, Fraunhofer Institute
[23] N. Sudha and A. R. Mohan, “Hardware-efficient image-based robotic path for Integrated Circuits, Erlangen, Germany. He has
planning in a dynamic environment and its FPGA implementation,” IEEE the technical leadership for the development of an
Trans. Ind. Electron., vol. 58, no. 5, pp. 1907–1920, May 2011. innovative indirect converting X-ray detector with
[24] R. Marin, G. León, R. Wirz, J. Sales, J. M. Claver, P. J. Sanz, and conventional optical sensors for scientific and industrial applications of non-
J. Fernández, “Remote programming of network robots within the UJI in- destructive testing (NDT), which is optimized for tasks that require a high
dustrial robotics telelaboratory: FPGA vision and SNRP network proto- dynamic range, a high speed, and a long life cycle. His interests in research
col,” IEEE Trans. Ind. Electron., vol. 56, no. 12, pp. 4806–4816, Dec. 2009. include optical sensors and cameras, field-programmable-gate-array design,
[25] E. Monmasson and M. N. Cirstea, “FPGA design methodology for in- embedded systems for image processing, and X-ray imaging for NDT.
dustrial control systems—A review,” IEEE Trans. Ind. Electron., vol. 54,
no. 4, pp. 1824–1842, Aug. 2007.
[26] J. J. Rodriguez-Andina, M. J. Moure, and M. D. Valdes, “Features, design Robert Weigel (S’88–M’89–SM’95–F’02) was born
tools, and application domains of FPGAs,” IEEE Trans. Ind. Electron., in Ebermannstadt, Germany, in 1956. He received
vol. 54, no. 4, pp. 1810–1823, Aug. 2007. the Dr.-Ing. and Dr.-Ing.habil. degrees in electrical
[27] H. Zhuang, K.-S. Low, and W.-Y. Yau, “Multichannel pulse-coupled engineering and computer science from the Mu-
neural-network-based color image segmentation for object detection,” nich University of Technology, Munich, Germany, in
IEEE Trans. Ind. Electron., vol. 59, no. 8, pp. 3299–3308, Aug. 2012. 1989 and 1992, respectively.
[28] S. Jin, D. Kim, T. T. Nguyen, D. Kim, M. Kim, and J. W. Jeon, “Design and He was a Research Engineer from 1982 to 1988,
implementation of a pipelined datapath for high-speed face detection using a Senior Research Engineer from 1988 to 1994, and
FPGA,” IEEE Trans. Ind. Informat., vol. 8, no. 1, pp. 158–167, Feb. 2012. a Professor for RF Circuits and Systems from 1994
[29] Y. Chen and V. Dinavahi, “Digital hardware emulation of universal ma- to 1996 with the Munich University of Technology.
chine and universal line models for real-time electromagnetic transient From 1996 to 2002, he was the Director of the
simulation,” IEEE Trans. Ind. Electron., vol. 59, no. 2, pp. 1300–1309, Institute for Communications and Information Engineering, University of Linz,
Feb. 2012. Linz, Austria. Since 2002, he has been the Head of the Institute for Electronics
[30] C. Charoensak and F. Sattar, “FPGA design of a real-time implementation Engineering, University of Erlangen-Nuremberg, Erlangen, Germany.
of dynamic range compression for improving television picture,” in Proc. Dr. Weigel was the recipient of the IEEE Microwave Applications Award in
IEEE ICICS, 2007, pp. 1–5. 2007. Within IEEE Microwave Theory and Techniques Society (MTT-S), he has
[31] A. Rosado-Muñoz, M. Bataller-Mompeán, E. Soria-Olivas, C. Scarante, been the Founder and Chair of the Austrian Communications/Microwave Theory
and J. F. Guerrero-Martínez, “FPGA implementation of an adaptive filter and Techniques Society Joint Chapter and Region 8 Coordinator. He is the Chair
robust to impulsive noise: Two approaches,” IEEE Trans. Ind. Electron., of MTT-2 Microwave Acoustics and the MTT-S President-Elect in 2013.
vol. 58, no. 3, pp. 860–870, Mar. 2011.
[32] T. Q. Vinh, J. H. Park, Y.-C. Kim, and S. H. Hong, “FPGA implementation
of real-time edge-preserving filter for video noise reduction,” in Proc.
IEEE ICCEE, 2008, pp. 611–614. Richard Rose (S’09) was born in Nuremberg,
[33] H. Dutta, F. Hannig, J. Teich, B. Heigl, and H. Hornegger, “A design Germany, in 1981. He received the Dipl.-Ing. degree
methodology for hardware acceleration of adaptive filter algorithms in in electrical engineering, electronics, and informa-
image processing,” in Proc. IEEE ASAP, 2006, pp. 331–340. tion technology from the Friedrich-Alexander Uni-
[34] R. Chen, L. Chen, and L. Chen, “System design consideration for digital versity of Erlangen-Nuremberg, Erlangen, Germany,
wheelchair controller,” IEEE Trans. Ind. Electron., vol. 47, no. 4, pp. 898– in 2007.
907, Aug. 2000. In 2008, he joined the Institute for Electronics
[35] R. Turney, “Two-dimensional linear filtering,” in Application Note: Xilinx Engineering, University of Erlangen-Nuremberg, as
FPGAs, 2007, pp. 1–8. a Research Assistant, and since 2010, he has been the
[36] M. Zhang and B. K. Gunturk, “Multiresolution bilateral filter for image Team Leader of the System Engineering group. His
denoising,” IEEE Trans. Image Process., vol. 17, no. 12, pp. 2324–2333, research interests include digital signal processing,
Dec. 2008. receiver design, antenna design, localization techniques, and wireless commu-
[37] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assess- nication systems.
ment: From error visibility to structural similarity,” IEEE Trans. Image Mr. Rose is a member of the IEEE Microwave Theory and Techniques So-
Process., vol. 13, no. 4, pp. 600–612, Apr. 2004. ciety, the IEEE Signal Processing Society, the IEEE Antennas and Propagation
[38] A. Gabiger-Rose, M. Kube, P. Schmitt, R. Weigel, and R. Rose, “Image Society, and the IEEE Communications Society. He served as a reviewer for the
denoising using bilateral filter with noise-adaptive parameter tuning,” in journal of Mathematical Problems in Engineering and the International Journal
Proc. IEEE IECON, 2011, pp. 4515–4520. of Electronics and Communications.

You might also like