Hardware Accelerated High Quality Recons

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Student – Metehău Denis-George

Specializarea – Informatica
An-1 Gr. 2 ZI

Hardware-Accelerated High-Quality Reconstruction of Volumetric Data


on PC Graphics Hardware

Abstract used in practice. Using such high-order reconstruction filters in


software indeed easily leads to exploding rendering times.
We describe a method for exploiting commodity 3D graphics In this paper, we will show how to exploit consumer 3D graph-
hard- ware in order to achieve hardware-accelerated high-quality ics hardware for accelerating high-order reconstruction of
filtering with arbitrary filter kernels. Our approach is based on volumet- ric data. The presented approach works in one, two, and
reordering the evaluation of the filter convolution sum to three di- mensions, respectively. The cases most interesting to us
accommodate the way the hardware works. We exploit multiple up to now are the reconstruction of images or slices in two
rendering passes to- gether with the capability of current graphics dimensions, and reconstruction of oblique slices through
hardware to index into several textures at the same time (multi- volumetric data. An inter- esting application of such slices is to
texturing). use them for direct volume rendering. Standard texture mapping
The method we present is applicable in one, two, and three di- hardware can be exploited for volume rendering by blending a
mensions. The cases we are most interested in up to now are two- stack of texture-mapped slices on top of each other [3]. These
dimensional reconstruction of object-aligned slices through volu- slices can be either viewport-aligned, which requires 3D texture
metric data, and three-dimensional reconstruction of arbitrarily mapping hardware [9, 23], or object- aligned, where 2D texture
ori- ented slices. As a fundamental building block, the basic mapping hardware suffices [19]. Our high-quality filtering
algorithm can be used in order to directly render an entire volume approach can be used to considerably improve reconstruction
by blend- ing a stack of slices reconstructed with high quality on quality of the individual slices in both of these cases, thus
top of each other. increasing the quality of the entire rendered volume.
We have used bicubic and tricubic B-splines and Catmull-Rom The method we present works by evaluating the filter convolu-
splines, as well as windowed sinc filters, as reconstruction kernels. tion sum in a different order than the one naturally used when
However, it is important to emphasize that our approach has no filter- ing in software. Instead of gathering the contribution of all
fundamental restrictions with regard to the filters that can be used. neigh- boring input samples in order to calculate a single output
Keywords: filtering, high-order reconstruction, slice rendering, sample at a time, we distribute the contribution of a single input
volume rendering, 3D hardware acceleration sample to all neighboring output samples. These contributions are
added up over multiple rendering passes in hardware, eventually
yielding the same result as the standard evaluation of the
convolution sum. In order to improve performance, each pass
1 Introduction employs multiple textures at the same time, for multiplying input
samples by corresponding values in the filter kernel.
A fundamental problem in computer graphics is how to The structure of the paper is as follows. After discussing related
reconstruct images and volumes from sampled data. The process work in section 2, we describe our method for hardware-
of deter- mining the original continuous data–or at least a accelerated high-order filtering in section 3. We discuss the basic
sufficiently ac- curate approximation–from discrete input data is idea and prin- ciple in sections 3.1 and 3.2, before applying it to
usually called re- construction. In volume visualization, the input reconstruction of images and axis-aligned slices of volumetric data
data is commonly given at evenly spaced discrete locations in in section 3.3, and arbitrarily oriented slices in section 3.4. In the
three-space. In theory, the original volumetric data can be latter case, al- though the target is two-dimensional, reconstruction
reconstructed entirely, provided certain conditions are honored (cf. has to be done in three dimensions. Three-dimensional
sampling theorem [18]). In re- ality, of course, filtering is always a reconstruction for three- dimensional images is described in
trade-off between performance and reconstruction quality. This is section 3.5, where we describe how our approach can be applied to
especially true for hardware im- plementations. Reconstruction in volume rendering using a stack of slices. Section 4 describes some
graphics hardware is usually done by using simple linear issues of current commodity graphics hardware, especially with
interpolation. In the most common case of two-dimensional respect to what our method re- quires or can make use of. Section
texture mapping hardware, bilinear filtering is employed in order 5 presents results. Section 6 summarizes what we have presented,
to reconstruct texture images. Analogously, 3D graphics tries to draw some general conclusions, and wraps up with future
accelerators that are able to use three-dimensional textures filter work.
them using trilinear interpolation.
Linear interpolation is fast, but introduces significant
reconstruc- tion artifacts. On the other hand, a lot of research in 2 Related work
the last few years has been devoted to improving reconstruction by
using high- order reconstruction filters [8, 12, 13, 14, 21]. Among The work relevant to ours is twofold. First, we are using filter theory
the inves- tigated filters are piecewise cubic functions, as well as and results on what types of filters are good in which situations.
windowed ideal reconstruction functions (windowed sinc filters). Second, since we propose several methods and applications
However, these functions were usually deemed to be too slow in making heavy use of hardware acceleration, work that has been
order to be done in these areas is also very relevant.
There is a vast amount of literature on filter analysis and de-
fHadwiger,Hauserg@VRVis.at, http://www.VRVis.at/vis/
y sign. Keys [7] derived a family of cardinal splines, and showed,
ftheussl,groellerg@cg.tuwien.ac.at, http://www.cg.tuwien.ac.at/home/
us-
ing a Taylor series expansion, that the Catmull-Rom spline, which
is in this family of cubic splines, is numerically most accurate. 3 Hardware-Accelerated High-Order Re-
Mitchell and Netravali [11] derived another family of cubic construction
splines quite popular in computer graphics, the BC-splines.
Marschner and Lobb [8] compared linear interpolation, cubic This section presents our approach for filtering input data by con-
splines, and win- dowed sincs. They concluded that linear volving it with an arbitrary filter kernel stored in multiple texture
interpolation is certainly the cheapest option and will likely remain maps, exploiting low-cost 3D graphics hardware. This method can
the method of choice for time critical applications. Cubic splines be used in order to achieve hardware-accelerated high-quality re-
perform quite well, espe- cially those BC-splines that fulfill 2C + construction of volumetric data. The basic principle is most easily
B = 1 which include the Catmull-Rom spline. Windowed sincs illustrated in the case of one-dimensional input data (sections 3.1
can provide arbitrar- ily good reconstruction, although they are and 3.2), and can also be applied in two (section 3.3), as well as
much more expensive. Mo¨ller et al. [12] provide a general three dimensions (sections 3.4 and 3.5).
framework for analyzing fil- ters in spatial domain again using a
Taylor series expansion of the convolution sum. They use this
framework to analyze the cardinal splines [12], affirming that the Basic principle
Catmull-Rom splines are numeri- cally most accurate, and the BC- Since we want to be able to employ arbitrary filter kernels for re-
splines [13], affirming that the fil- ters with parameters fulfilling construction, we have to evaluate the well-known filter
2C + B = 1 are numerically most accurate. Since numerical convolution sum:
considerations alone may not always be appropriate, they also
show how to use their framework to design accurate and smooth X
bx + m

reconstruction filters [14]. Turkowsky [22] used windowed ideal g (x ) = f [ x ℄ h (x ) = f [i℄h(x i) (1)
reconstruction filters for image resampling tasks. Theußl et al. [21] i=bx m+1

used the framework developed by Mo¨ller et al. [12] to assess the


quality of windowed reconstruction filters and to derive optimal This equation describes a convolution of the discrete input
values for the parameters of Kaiser and Gaussian windows. samples f [x℄ with a continuous reconstruction filter h(x). In
the case of reconstruction, this is essentially a sliding average of
the samples and the reconstruction filter. If the function to be
In the field of hardware-accelerated visualization, Meißner et reconstructed was bandlimited and sampled properly, using the
al. [10] have done a thorough comparison of the four prevalent ap- sinc function as recon- struction filter would result in perfect
proaches to volume rendering, one of them being the use of tex- reconstruction. However, the sinc function is impracticable
ture mapping hardware. Hardware-accelerated texture mapping because of its infinite extent. There- fore, in practice finite
has been used to accelerate volume rendering for quite some time. approximations to the sinc filter are used, which yield–
The original SGI RealityEngine [1] introduced 3D textures. They depending on the properties of these approximations– results of
can be used effectively for rendering volumes at interactive frame varying quality. In equation 1, the (finite) half-width of the filter
rates, as shown by Cabral et al. [3], and Cullip and Neumann [4], kernel is denoted by m.
for exam- ple. These early approaches all stored a preshaded In order to be able to exploit standard graphics hardware for
volume in a 3D texture, which had to be recomputed whenever the per- forming this computation, we do not use the evaluation order
viewing parame- ters, lighting conditions, or transfer function were usu- ally employed, i.e., in software-based filtering. The
changed. Wester- mann and Ertl [23] introduced several convolution sum is commonly evaluated in its entirety for a single
approaches how to accelerate volume rendering with graphics output sam- ple at a time. That is, all the contributions of
hardware that is capable of three- dimensional texture mapping neighboring input samples (their values multiplied by the
and supports a color matrix. They were able to render corresponding filter values) are gathered and added up in order to
monochrome images of shaded isosurfaces at interactive rates– calculate the final value of a certain output sample. This
without explicitly extracting any geometry–and performed shading “gathering” of contributions is shown in figure 1(a). This figure
on-the-fly, without requiring the volume tex- ture to be updated. uses a simple tent filter as an example. It shows how a single
Meißner et al. [9] have extended this approach for semi- output sample is calculated by adding up two contributions. The
transparent volume rendering, and are able to apply both first contribution is gathered from the neigh- boring input sample
classification and colored shading at run-time without changing on the left-hand side, and the second one is gathered from the
the volume texture itself. Their approach additionally requires the input sample on the right-hand side. In the case of this example,
SGI pixel textures extension of OpenGL [20]. Dachille et al. [5] the convolution results in linear interpolation, due to the tent filter
used a mixed-mode approach between applying a transfer function employed. For generating the desired output data in its entirety,
and shading in software, and rendering the resulting texture this is done for all corresponding resampling points (output sample
volume in hardware. Many of the approaches presented earlier locations).
have been adapted to standard PC graphics hardware later on, e.g., Our method uses a different evaluation order. Instead of
by Rezk- Salama et al. [19]. They are only using two-dimensional focusing on a single output sample at any one time, we calculate
textures in- stead of three-dimensional ones, exploiting the multi- the contri- bution of a single input sample to all corresponding
texturing and multi-stage rasteriziation capabilites of NVIDIA output sample locations (resampling points) first. That is, we
GeForce graphics cards. The high-quality filtering approach we distribute the con- tribution of an input sample to its neighboring
are proposing can be used in conjunction with many of the methods output samples, in- stead of the other way around. This
previously described. “distribution” of contributions is shown in figure 1(b). In this case,
the final value of a single out- put sample is only available when
The OpenGL [17] imaging subset (introduced in OpenGL 1.2) all corresponding contributions of input samples have been
offers convolution capabilities, but these are primarily intended for distributed to it.
image processing and unfortunately cannot be used for the kind of We evaluate the convolution sum in the order outlined above,
reconstruction we desire. Furthermore, these features are usually since the distribution of the contributions of a single relative input
not supported natively by the graphics hardware, especially not by sample can be done in hardware for all output samples (pixels) si-
consumer PC graphics accelerators. Hopf and Ertl [6] have shown multaneously. The final result is gradually built up over multiple
how to achieve 3D convolution by building upon two-dimensional rendering passes. The term relative input sample location denotes
convolution in OpenGL. a relative offset of an input sample with respect to a set of output
samples. In the example of a one-dimensional tent filter (like in
fig- ure 1(a)), there are two relative input sample locations. One
could
input samples
output sample
input samples

+
+ filter kernel
x

resampling point resampling points

(a) (b)

Figure 1: Gathering vs. distribution of input sample contributions (tent filter): (a) Gathering all contributions to a single output sample (b)
Distributing a single input sample’s contribution (tent filter not shown).

be called the “left-hand neighbor,” the other the “right-hand neigh-


In the case of a tent filter (which has width two) this can be seen in
bor.” In the first pass, the contribution of all respective left-hand
figure 1(a), by imagining the filter kernel sliding from the second
neighbors is calculated. The second pass then adds the
input sample shown to the third input sample.
contribution of all right-hand neighbors. Note that the number of
passes depends on the filter kernel used, see below. From now on we will be calling a segment of width one from
one integer location in a filter kernel to the next an integer filter
Thus, the same part of the filter convolution sum is added to the
kernel tile, or simply filter tile. Consequently, a one-dimensional
previous result for each pixel at the same time, yielding the final
tent filter has two filter tiles, corresponding to the fact that it has
result after all parts have been added up. From this point of view,
width two. Looking again at figure 1(a), we can now say that each
the graph in figure 1(b) depicts both rendering passes that are nec-
filter tile covers exactly one input sample, regardless of where the
essary for reconstruction with a one-dimensional tent filter, but only
kernel is actually positioned.
with respect to the contribution of a single input sample. The con-
tributions distributed simultaneously in a single pass are depicted
in figures 2 and 3, respectively. In the first pass, shown in figure 2, input samples
the contributions of all relative left-hand neighbors are distributed.
Consequently, the second pass, shown in figure 3, distributes the
contributions of all relative right-hand neighbors. Adding up the
distributed contributions of these two passes yields the final result
for all resampling points (i.e., linearly interpolated output values in
this example).

Accommodating graphics hardware


The rationale for the approach described in the previous section is
that, at the pixel or fragment level, graphics hardware basically op- resampling points
erates by applying simple, identical operations to a lot of pixels
simultaneously–or at least in very rapid succession, which for all Figure 2: Distributing the contributions of all “left-hand” neigh-
conceptual purposes can be viewed as fully parallel operation from bors; tent filter.
the outside. These operations have access to relatively few inputs.
However, in general filtering based on a convolution of an input
sig- nal with a filter kernel, a potentially unbounded number of
input samples
inputs needs to be available for the calculation of a single output
value. Therefore, in order to exploit graphics hardware and
accommodate the way it works, our method reorders the
evaluation order of the filter convolution sum. For a more
elaborate discussion of the fea- tures of graphics hardware that can
be exploited by our approach, as well as additional hardware-
related issues, see section 4.
Section 3.1 has already adapted the usual evaluation order of
the convolution sum in order to facilitate the use of graphics
hardware. We are now going to describe in detail how this
basic principle can be employed in order to achieve high-order
reconstruction in hardware. The following facts are crucial to the resampling points
operation of our method, and together form the reason why it
works. If we look at filtering and the convolution of an input Figure 3: Distributing the contributions of all “right-hand” neigh-
signal with a filter kernel at a single output sample location, we bors; tent filter.
can observe that–even for a kernel of arbitrary width–each
segment from one integer location of the kernel to the next is only
relevant for exactly one input sample.
If we now imagine all output sample locations (resampling shifted input samples (texture 0)
points) between two given input samples simultaneously, instead (nearest−neighbor interpolation)
of concentrating on a single output sample, we can further observe x
filter tile (texture 1)
that:
this area has width one, which is exactly the width of a
single filter tile, pass 1
all output samples in this area get a non-zero contribution
from each filter tile exactly once, 1 2 34
mirrored
filter tile 1 +
as many input samples yield a non-zero contribution as there
are filter tiles in the filter kernel, and
this number is, as defined above, the width of the filter pass 2
kernel.
2 3 4 5

Now, instead of imagining the filter kernel being centered at the


“current” output sample location, we note that an identical map- filter tile 2 −
ping of input samples to filter values can be achieved by pass 3
replicating a single filter tile mirrored in all dimensions (in the
case of more than one dimension) repeatedly over the output 0 1 2 3
sample grid, see figure 4. The scale of this mapping is chosen so
that the size of a single tile corresponds to the width from one filter tile 0

input sample to the next. Note that the need for mirroring filter input sample coordinates
tiles has nothing to do with the fact that a filter kernel is usually
mirrored in the filter con- volution sum (see equation 1). Mirroring
is necessary to compen- sate for the fact that, instead of “sliding” pass 4
the filter kernel, individual filter tiles are positioned at (and 3 4 5 6
replicated to) fixed locations. Fig- ure 4 shows how this works in
the case of a one-dimensional cubic Catmull-Rom spline, which resampling points filter tile 3

has width four and thus consists of four filter tiles. We calculate
the contribution of a single specific filter tile to all output samples
in a single pass. The input samples used
in a single pass correspond to a specific relative input sample loca- Figure 4: Catmull-Rom spline of width four used for
tion or offset with regard to the output sample locations. That is, in reconstruction of a one-dimensional function in four passes
one pass the input samples with relative offset zero are used for all
output samples, then the samples with offset one in the next pass,
and so on. The number of passes necessary is equal to the number Reconstruction of images and aligned slices
of filter tiles the filter kernel used consists of. Note that the corre-
spondence of passes and filter tiles shown in figure 4 is not simply When enlarging images or reconstructing object-aligned slices
left-to-right, reflecting the order we are using in our actual imple- through volumetric data taken directly from a stack of such slices,
mentation in order to avoid unintentional clamping of intermediate high-order two-dimensional filters have to be used in order to
results in the frame buffer (see section 4.4). achieve high-quality results.
Remember that there are two basic inputs needed by the The basic algorithm outlined in the previous section for one di-
convolu- tion sum, the input samples, and the filter kernel. Since mension can easily be applied in two dimensions, exploiting two-
we change only the order of summation but leave the multiplication texture multi-texturing hardware in multiple rendering passes. For
untouched, we need these two available at the same time. each output pixel and pass, our method takes two inputs. Unmod-
Therefore, we em- ploy multi-texturing with (at least) two textures ified (i.e., unfiltered) image values, and filter kernel values. That
and retrieve input samples from the first texture, and filter kernel is, two 2D textures are used simultaneously. One texture contains
values from the sec- ond texture. Actually, due to the fact that only the entire source image, and the other texture contains the filter tile
a single filter tile is needed during a single rendering pass, all tiles needed in the current pass, which in this case is a two-dimensional
are stored and down- loaded to the graphics hardware as separate unit square. Figure 5 shows a two-dimensional filter kernel
textures. The required replication of tiles over the output sample texture. For this image, a bicubic B-spline filter has been used. All
grid is easily achieved by configuring the hardware to sixteen tiles are shown, where in reality only four tiles would
automatically extend the texture do- main beyond [0; 1℄ [0; 1℄ by actually be downloaded to the hardware. Due to symmetry, the
simply repeating the texture1. In order to fetch input samples in other twelve tiles can easily be generated on-the-fly by mirroring
unmodified form, nearest-neighbor interpolation has to be used for texture coordi- nates. Figure 6 shows one-dimensional cross-
the input texture. The textures con- taining the filter tiles are sections of additional filter kernels that we have used.
sampled using the hardware-native linear interpolation. If a given
In addition to using the appropriate filter tile, in each pass an
hardware architecture is able to support 2n textures at the same
appropriate offset has to be applied to the texture coordinates of
time, the number of passes can be reduced by
the texture containing the input image. As explained in the
n. That is, with two-texture multi-texturing four passes are needed
previous section, each pass corresponds to a specific relative
for filtering with a cubic kernel in one dimension, whereas with
location of an input sample. Thus, the slice texture coordinates
four-texture multi-texturing only two passes are needed, etc.
have to be offset and scaled in order to match the point-sampled
Our approach is not limited to symmetric filter kernels,
input image grid with the grid of replicated filter tiles. In the case
although symmetry can be exploited in order to save texture
of a cubic filter kernel for bicubic filtering, sixteen passes need to
memory for the filter tile textures. It is also not limited to
be performed on two-texture multi-texturing hardware.
separable filter kernels (in two and three dimensions,
respectively), as will be shown in the following sections. As mentioned in the previous section, our approach is neither
limited to symmetric nor separable filters. However, non-
1
In OpenGL, via a clamp mode of GL REPEAT separable
with two 3D textures, but of course uses a lot more polygons. In
the multi-texture approach, just a single polygon has to be drawn
in each pass. If the second 3D texture is not available, several
thou- sand polygons have to be drawn instead. If, however, a 2D
texture is available concurrently with the 3D texture, it should be
possible to adapt the approach for rendering oblique slices with
trilinear in- terpolation described by Rezk-Salama et al. [19] to our
method. In this case, instead of clipping the slice against all
voxels, clipping would only need to be done against a set of planes
orthogonal to a single axis. This would reduce the number of
polygons needed to the resolution of the volume along that axis
Figure 5: Bicubic B-spline filter kernel; filter tiles separated by (e.g., 128).
white lines.

Volume rendering
filter kernels may need additional passes if they contain both pos-
itive and negative areas, see section 4.4. Apart from this, sepa- Since we are able to reconstruct axis-aligned slices, as well as
rable as well as non-separable filter kernels are simply stored in arbi- trarily oriented slices, with our high-quality filtering
two-dimensional filter tile textures. approach, our technique can also be used for direct volume
rendering by using one of the two major approaches of DVR
exploiting texture mapping hardware. The first possibility is to
Reconstruction of oblique slices render the volume by blending a stack of object-aligned slices on
When planar slices through 3D volumetric data are allowed to be lo- top of each other. Each one of these slices would be reconstructed
cated and oriented arbitrarily, three-dimensional filtering has to be with the high-quality 2D filter- ing approach outlined in section
performed although the result is still two-dimensional. On 3.3. All of these slices would then be blended via back-to-front
graphics hardware, this is usually done by trilinearly interpolating compositing. The second possibility is to render the volume by
within a 3D texture. Our method can also be applied in this case blending a stack of viewport-aligned slices on top of each other,
in order to improve reconstruction quality considerably. The each one of them individually reconstructed with the 3D filtering
conceptually straightforward extension of the 2D approach approach of section 3.4. Analogously to the standard approach
described in the pre- vious section, simultaneously using two 2D using 3D texture mapping hardware and trilinear interpolation, our
textures, achieves the equivalent for three-dimensional method in this case requires the capability to use 3D textures, but
reconstruction by simultaneously using two 3D textures. The first achieves higher reconstruction quality, since the individual slices
3D texture contains the input vol- ume in its entirety, whereas the can be reconstructed with high-order filter kernels. Furthermore,
second 3D texture contains the cur- rent filter tile, which in this our approach can also be used to reconstruct gradi- ents in high
case is a three-dimensional unit cube. In the case of a cubic filter quality, in addition to reconstructing density values. This is
kernel for tricubic filtering, 64 passes need to be performed on possible in combination with hardware-accelerated methods that
two-texture multi-texturing hardware. If such a kernel is store gradients in the RGB components of a texture [9, 23].
symmetric we download eight 3D textures for the filter tiles,
generating the remaining 56 without any performance loss by
mirroring texture coordinates. Due to the high memory consump- 4 Commodity graphics hardware issues
tion of 3D textures, it is especially important that the filter kernel
need not be downloaded to the graphics hardware in its entirety if This section discusses features and issues of current low-cost 3D
it is symmetric. Texture compression can also be used in order to min- graphics accelerators that are required or can be exploited by
imize on-board texture memory consumption. Various compression our method. We are specifically targeting widely available con-
schemes are available for this purpose [16]. sumer graphics hardware on the PC platform, like the NVIDIA
Unfortunately, current hardware–especially the ATI Radeon– GeForce [15], and the ATI Radeon [2]. The graphics API we are
does not support multi-texturing with two 3D textures. Although using in our work is OpenGL [17].
the Radeon has three texture units, the most it can do is multi-
texturing with one 3D texture and one 2D texture at the same time.
Since we need data from two 3D textures available Single-pass multi-texturing
simultaneously, the missing functionality has to be emulated in A very important feature of today’s graphics hardware is the capa-
this case. Concep- tually, this is quite simple. It incurs a major bility to texture-map a single polygon with more than one texture
performance penalty, however, and thus has only the character of a at the same time2. Usually, the process of multi-texturing is ac-
proof-of-concept. As- suming that only a single 3D texture is cessed by programming several texture stages, the output of each
available, we provide the second input (after the filter kernel) to stage becoming the input to the next. More recent functionality3
the filter convolution sum as color of flat-shaded polygons. This adds a lot more flexibility to this model, which is especially true
is possible since the input volume need only be point-sampled. We for NVIDIA’s extensions for register combiners4 and texture
generate these polygons on-the-fly by intersecting the voxel grid shaders5, the latter only having been introduced with the GeForce
of the input volume with the plane of the slice we want to 3 [16]. The practically de-facto standard on current graphics
reconstruct. Each clipped part of the slice that is entirely contained hardware is two-texture multi-texturing, where two textures can be
within a single voxel is rendered as a polygon of a single color, the used simul- taneously. However, recent hardware like the ATI
color being manually retrieved from the input volume and Radeon and the GeForce 3 are already able to use three and four
assigned to the polygon. At the same time, the current filter tile is textures, respec- tively. This can be exploited in order to reduce
activated as a 3D texture filtered via the number of ren- dering passes required.
trilinear interpolation. For each output pixel and pass, the color
of the polygon (which would normally be fetched from the second 2
exposed via ARB multitexture in OpenGL
3D texture that is not available) is multiplied by the kernel value 3
EXT texture env combine, NV texture env combine4 etc.
4
fetched from the 3D texture and stored into the frame buffer. This NV register combiners, NV register combiners2
approach ultimately yields the exact same result as multi-texturing 5
NV texture shader, NV texture shader2
1.2 1.2
Cubic B-spline Blackman windowed sinc Blackman window (width = 4)
Catmull-Rom spline
1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0

-0.2
-0.2
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
(a) (b)

Figure 6: Filter kernels we have used: (a) Cubic B-spline and Catmull-Rom spline; (b) Blackman windowed sinc depicting also the window
itself.

3D texturing Frame buffer subtraction


For a long time, 3D textures have only been available on
workstation-class graphics hardware [1]. Recently, however, they Another problem resulting from the [0 1℄ range of current frame
;

have also entered the realm of consumer graphics hardware, e.g., buffers is that negative numbers cannot be represented directly.
with the ATI Radeon. 3D textures have a number of uses, ranging Usually, this limited range permeates the entire graphics pipeline,
from solid texturing to direct volume rendering. Analogously to but even on the latest hardware like the NVIDIA GeForce 3, that
the standard bilinear interpolation performed by 2D texture offers texture formats with signed values 6, the result of a single
mapping hardware, the three-dimensional counterpart uses pass is clamped to [0 1℄ before it is combined with the contents of
;

trilinear interpo- lation within the texture volume in order to the frame buffer. Applications that need to subtract from the frame
reconstruct the texture at the desired locations. Direct volume buffer cannot do this by simply using negative numbers.
rendering is easily able to exploit 3D texture mapping capabilities In OpenGL, at least explicit subtraction is supported in princi-
by blending a stack of 2D slices on top of each other [23]. These ple7. Unfortunately, not all graphics accelerators support this func-
slices are usually aligned with the viewport and therefore have an tionality. Even if explicit subtraction is supported by the hardware
arbitrary location and ori- entation with respect to the texture and exported by the API, it is not possible to switch between ad-
domain itself. Although 3D textures are not yet widely supported dition and subtraction on a per-pixel basis, i.e., the behavior that
in the consumer marketplace, they will very likely become a could naturally be achieved by using signed numbers. Frame
standard feature in the immediate fu- ture. The method proposed in buffer subtraction is a part of the imaging subset of OpenGL 1.2.
this paper employs three-dimensional texture mapping capabilities The en- tire subset is usually only supported in software. Thus,
for reconstructing arbitrarily oriented slices through volumetric NVIDIA GeForce cards, for example, offer this capability8
data. separately.
Due to the [0 1℄ range of the frame buffer, care also has to be
;

taken with respect to the order of rendering passes, in order to


Frame buffer precision and range avoid the possibility of intermediate negative results. Usually, all
Practically all rendering algorithms employing multiple passes are posi- tive passes are rendered first. Then, the absolute values of
prone to artifacts due to limited frame buffer precision. In addition negative passes can be subtracted from the contents of the frame
to the quite obvious problem of insufficient precision, the integer buffer.
or fixed point nature of most frame buffers, together with the usual In our case, the requirement for signed numbers, or at least the
[0 1℄ range, removes a lot of flexibility with regard to storing in-
; possibility of explicit subtraction–which in our method suffices for
termediate results. Frame buffers are usually employed to store all practical uses–, depends on the filter kernel that one wants to
the final image of a rendered scene. Hence, a frame buffer pre- use. In the simplest case of entirely positive kernels all intermedi-
cision of eight bits per color channel easily suffices in a standard ate and end results are positive as well. If the filter kernel is partially
scenario—although even then the limited range can lead to prob- negative, it can easily be used as long as all zero crossings are at
lems with accurately displaying certain static images. However, in- teger positions. If this is not the case, filter tiles containing
if the frame buffer is used for holding the result of intermediate entries of differing sign need to be split into two separate tiles. The
computations or for blending many textures on top of each other, first tile must contain the positive entries with all negatives set to
accumulating color information over many passes, a resolution of zero. Conversely, the second tile contains the absolute values of
eight bits is often too low. Unfortunately, the frame buffers of all the neg- ative entries, with all positives set to zero. These two
currently available consumer 3D accelerators are still limited to at split-up tiles have to be processed in separate rendering passes,
most eight bits of precision per color channel. The usual approach increasing the overall number of passes. Fortunately, kernels that
to avoiding excessive artifacts due to limited frame buffer switch between negative and positive areas at arbitrary fractional
precision is to calculate intermediate results with scaled-up values. locations are not very common. For instance, all practical
This gives rise to the problem of unintentional clamping, where interpolatory filters that are also separable satisfy the criterion that
values exceed the [0 1℄ range of the frame buffer and are therefore
; all zero crossings are at integer positions. See figure 6 for the filter
clamped to 1 0. : kernels we have used as an example. All of them change sign at
the integers, if at all.
With care, this can be avoided by reading back intermediate results
6
after a couple of passes and temporarily storing them. After all the SIGNED RGB NV, SIGNED APLHA NV, etc.
7
intermediate results have been calculated, they can be combined to via glBlendEquationEXT(GL FUNC REVERSE SUBTRACT)
the final image in the frame buffer, scaling them down on-the-fly. 8
via EXT blend subtract
(a) (b) (c) (d)

Figure 7: Slice of MR head data set, reconstructed using: (a) Bilinear interpolation; (b) Blackman windowed sinc; (c) Bicubic Catmull-Rom
spline; (d) Bicubic B-spline.

Bicubic B-spline Bicubic Catmull-Rom Tricubic B-spline


kernel width 4 4 4
slice orientation aligned aligned oblique
number of passes 16 16 64
slice resolution 256x256 256x256 128x128
render resolution 600x600 600x600 500x500
NVIDIA GeForce 2 [fps] 10.88 10.88 -
ATI Radeon [fps] 8.91 8.91 1.24

Table 1: Frame rates for different scenarios. Note that the tricubic case uses the fall-back algorithm.

5 Results screws. The slice is not aligned with the slice stack, but instead
has an arbitrary location and orientation with respect to the texture
In our work, we are focusing on widely available low-cost PC domain. Since the Radeon does not support multi-texturing with
graphics hardware. Currently, the two premier graphics accelera- two 3D textures, we have used the fall-back approach for this case
tors in this field are the NVIDIA GeForce 2 and the ATI Radeon. outlined in section 3.4.
We have implemented and tested our method on a GeForce 2 with
Table 1 shows some timing results of our test implementation.
32MB RAM, and a Radeon with 64MB RAM. The graphics API
We have used a Pentium III, 733 MHz, 512MB of RAM, with the
we are using is OpenGL. Although the GeForce 3 is already on the
two graphics cards described above (GeForce 2 and Radeon). Note
horizon, it is not widely available yet.
that the timings do not depend on the shape of the filter kernel per
The GeForce 2 supports multi-texturing with two 2D textures at se, only on its width.
the same time and offers the capability to subtract from the
contents of the frame buffer. It does not support 3D textures,
however. The Radeon supports up to three simultaneous 2D
textures, as well as one 3D texture and one 2D texture at the same
6 Conclusions and future work
time. Unfortunately, it does not allow to subtract from the frame
We have presented a general approach for high-quality filtering that
buffer. These properties of the graphics hardware we have used
is able to exploit hardware acceleration for reconstruction with ar-
lead to a couple of restric- tions with respect to what parts of our
bitrary filter kernels. Conceptually, the method is not constrained
approach can be implemented on what platform.
to a certain dimensionality of the data, or the shape of the filter
As filter kernels we have used a cubic B-spline of width four kernel. In practice, limiting factors are the number of rendering
and a cubic Catmull-Rom spline, also of width four. In addition to passes and the precision of the frame buffer.
these we have also tested a windowed sinc filter kernel.
Our method is quite general and can be used for a lot of appli-
We have reconstructed object-aligned planar slices on both the cations. With regard to volume visualization, the reconstruction of
GeForce 2, as well as the Radeon. Figure 7(a) shows a part from object-aligned, as well as oblique slices through volumetric data is
a 256x256 slice from a human head MR data set. The image was especially interesting. Reconstruction of slices can also be used
rendered in a resolution of 600x600. In this figure, reconstruction for direct volume rendering.
was done with the hardware-native bilinear interpolation. Thus,
We are exploiting commodity graphics hardware, multi-
interpolation artifacts are clearly visible when viewed on screen.
texturing, and multiple rendering passes. The number of passes
The same slice with the same conditions but different reconstruc-
is a major factor determining the resulting performance.
tion kernels is shown in figures 7(b) (Blackman windowed sinc),
Therefore, future hardware that supports many textures at the
7(c) (cubic Catmull-Rom), and 7(d) (cubic B-spline). The Black-
same time–not only 2D textures, but also 3D textures–will make
man windowed sinc and Catmull-Rom kernels can only be used on
the application of our method more feasible for real-time use.
the GeForce, due to the fact that the Radeon does not support
Since hardware that is able to combine multi-texturing with 3D
frame buffer subtraction. Figure 9 (colorplate) shows slices
reconstructed with different filters together with magnified regions textures is not yet available to us, we are still using a “simula-
to highlight the differences. tion” of our algorithm for reconstructing oblique slices that is
much slower. Thus, we are really looking forward to the
Since the GeForce 2 does not support 3D textures, we have tested
immediate fu- ture where such hardware will be available. Still,
our approach for reconstructing arbitrarily oriented slices only on
we would also like to adapt the approach outlined by Rezk-Salama
the Radeon. Figure 8 shows a slice from a vertebrae data set with
et al. [19] in or- der to considerably speed up the intermediate
solution. We would
(a) (b) (c) (d)

Figure 8: Oblique slice through vertebrae data set with screws: (a) Trilinear filter; (b) Part of (a) magnified; (c) Tricubic B-spline filter; (d)
Part of (c) magnified.

also like to experiment with additional filter kernels. Please see


[10] M. Meißner, J. Huang, D. Bartz, K. Mu¨ller, and R.
http://www.VRVis.at/vis/research/hq-hw-reco/ for high-resolution
Crawfis. A practical evaluation of four popular volume
images and the most up-to-date information on this ongoing work.
rendering al- gorithms. In Procceedings of IEEE Symposium
on Volume Visualization, pages 81–90, 2000.
7 Acknowledgments [11] D. P. Mitchell and A. N. Netravali. Reconstruction filters in
computer graphics. In Proceedings of SIGGRAPH ’88, pages
Parts of this work have been carried out as part of the basic 221–228, 1988.
research on visualization at the VRVis Research Center in Vienna,
Austria (http://www.VRVis.at/vis/), which is funded in part by an [12] T. Mo¨ller, R. Machiraju, K. Mu¨ller, and R. Yagel.
Austrian research program called Kplus. Classifica- tion and local error estimation of interpolation
and derivative filters for volume rendering. In Proceedings
of IEEE Sympo- sium on Volume Visualization, pages 71–78,
References 1996.
[13] T. Mo¨ller, R. Machiraju, K. Mu¨ller, and R. Yagel.
[1] K. Akeley. RealityEngine graphics. In Proceedings of Evalua- tion and Design of Filters Using a Taylor Series
SIGGRAPH ’93, pages 109–116, 1993. Expansion. IEEE Transactions on Visualization and
Computer Graphics, 3(2):184–199, 1997.
[2] ATI web page. http://www.ati.com/.
[14] T. Mo¨ller, K. Mu¨ller, Y. Kurzion, Raghu Machiraju, and
[3] B. Cabral, N. Cam, and J. Foran. Accelerated volume ren- Roni Yagel. Design of accurate and smooth filters for
dering and tomographic reconstruction using texture function and derivative reconstruction. In Proceedings of
mapping hardware. In Proceedings of IEEE Symposium on IEEE Symposium on Volume Visualization, pages 143–151,
Volume Vi- sualization, pages 91–98, 1994. 1998.

[4] T. J. Cullip and U. Neumann. Accelerating volume recon- [15] NVIDIA web page. http://www.nvidia.com/.
struction with 3D texture mapping hardware. Technical Re- [16] NVIDIA OpenGL extension specifications document.
port TR93-027, Department of Computer Science, http://www.nvidia.com/developer.
University of North Carolina, Chapel Hill, 1993.
[17] OpenGL web page. http://www.opengl.org/.
[5] F. Dachille, K. Kreeger, B. Chen, I. Bittner, and A.
Kaufman. High-quality volume rendering using texture [18] A. V. Oppenheim and R. W. Schafer. Digital Signal
mapping hard- ware. In Proceedings of Process- ing. Prentice Hall, Englewood Cliffs, 1975.
Eurographics/SIGGRAPH Graphics Hardware Workshop
1998, 1998. [19] C. Rezk-Salama, K. Engel, M. Bauer, G. Greiner, and T.
Ertl. Interactive volume rendering on standard PC graphics
hard- ware using multi-textures and multi-stage rasterization.
[6] M. Hopf and T. Ertl. Accelerating 3D convolution using
In Proceedings of Eurographics/SIGGRAPH Graphics
graphics hardware. In Proceedings of IEEE Visualization
Hard- ware Workshop 2000, 2000.
’99, pages 471–474, 1999.
[20] Silicon Graphics, Inc. Pixel textures extension. Specification
[7] R. G. Keys. Cubic convolution interpolation for digital im- available from http://www.opengl.org, 1996.
age processing. IEEE Trans. Acoustics, Speech, and Signal
Processing, ASSP-29(6):1153–1160, December 1981. [21] T. Theußl, H. Hauser, and M. E. Gro¨ller. Mastering
windows: Improving reconstruction. In Proceedings of IEEE
[8] S. R. Marschner and R. J. Lobb. An evaluation of Sympo- sium on Volume Visualization, pages 101–108, 2000.
reconstruc- tion filters for volume rendering. In Proceedings
of IEEE Vi- sualization ’94, pages 100–107, 1994. [22] K. Turkowski. Filters for common resampling tasks. In An-
drew S. Glassner, editor, Graphics Gems I, pages 147–165.
[9] M. Meißner, U. Hoffmann, and W. Straßer. Enabling clas- Academic Press, 1990.
sification and shading for 3D texture mapping based volume [23] R. Westermann and T. Ertl. Efficiently using graphics hard-
rendering. In Proceedings of IEEE Visualization ’99, pages ware in volume rendering applications. In Proceedings of
207–214, 1999. SIGGRAPH ’98, pages 169–178, 1998.
(a)

(b)

(c)

(d)

Figure 9: Slice from MR data set, reconstructed using different filters (right images show shaded portion of left image enlarged): (a)
Bilinear filter; (b) Blackman windowed sinc; (c) Bicubic Catmull-Rom spline; (d) Bicubic B-spline.

You might also like