Hardware Accelerated High Quality Recons
Hardware Accelerated High Quality Recons
Hardware Accelerated High Quality Recons
Specializarea – Informatica
An-1 Gr. 2 ZI
reconstruction filters [14]. Turkowsky [22] used windowed ideal g (x ) = f [ x ℄ h (x ) = f [i℄h(x i) (1)
reconstruction filters for image resampling tasks. Theußl et al. [21] i=bx m+1
+
+ filter kernel
x
(a) (b)
Figure 1: Gathering vs. distribution of input sample contributions (tent filter): (a) Gathering all contributions to a single output sample (b)
Distributing a single input sample’s contribution (tent filter not shown).
has width four and thus consists of four filter tiles. We calculate
the contribution of a single specific filter tile to all output samples
in a single pass. The input samples used
in a single pass correspond to a specific relative input sample loca- Figure 4: Catmull-Rom spline of width four used for
tion or offset with regard to the output sample locations. That is, in reconstruction of a one-dimensional function in four passes
one pass the input samples with relative offset zero are used for all
output samples, then the samples with offset one in the next pass,
and so on. The number of passes necessary is equal to the number Reconstruction of images and aligned slices
of filter tiles the filter kernel used consists of. Note that the corre-
spondence of passes and filter tiles shown in figure 4 is not simply When enlarging images or reconstructing object-aligned slices
left-to-right, reflecting the order we are using in our actual imple- through volumetric data taken directly from a stack of such slices,
mentation in order to avoid unintentional clamping of intermediate high-order two-dimensional filters have to be used in order to
results in the frame buffer (see section 4.4). achieve high-quality results.
Remember that there are two basic inputs needed by the The basic algorithm outlined in the previous section for one di-
convolu- tion sum, the input samples, and the filter kernel. Since mension can easily be applied in two dimensions, exploiting two-
we change only the order of summation but leave the multiplication texture multi-texturing hardware in multiple rendering passes. For
untouched, we need these two available at the same time. each output pixel and pass, our method takes two inputs. Unmod-
Therefore, we em- ploy multi-texturing with (at least) two textures ified (i.e., unfiltered) image values, and filter kernel values. That
and retrieve input samples from the first texture, and filter kernel is, two 2D textures are used simultaneously. One texture contains
values from the sec- ond texture. Actually, due to the fact that only the entire source image, and the other texture contains the filter tile
a single filter tile is needed during a single rendering pass, all tiles needed in the current pass, which in this case is a two-dimensional
are stored and down- loaded to the graphics hardware as separate unit square. Figure 5 shows a two-dimensional filter kernel
textures. The required replication of tiles over the output sample texture. For this image, a bicubic B-spline filter has been used. All
grid is easily achieved by configuring the hardware to sixteen tiles are shown, where in reality only four tiles would
automatically extend the texture do- main beyond [0; 1℄ [0; 1℄ by actually be downloaded to the hardware. Due to symmetry, the
simply repeating the texture1. In order to fetch input samples in other twelve tiles can easily be generated on-the-fly by mirroring
unmodified form, nearest-neighbor interpolation has to be used for texture coordi- nates. Figure 6 shows one-dimensional cross-
the input texture. The textures con- taining the filter tiles are sections of additional filter kernels that we have used.
sampled using the hardware-native linear interpolation. If a given
In addition to using the appropriate filter tile, in each pass an
hardware architecture is able to support 2n textures at the same
appropriate offset has to be applied to the texture coordinates of
time, the number of passes can be reduced by
the texture containing the input image. As explained in the
n. That is, with two-texture multi-texturing four passes are needed
previous section, each pass corresponds to a specific relative
for filtering with a cubic kernel in one dimension, whereas with
location of an input sample. Thus, the slice texture coordinates
four-texture multi-texturing only two passes are needed, etc.
have to be offset and scaled in order to match the point-sampled
Our approach is not limited to symmetric filter kernels,
input image grid with the grid of replicated filter tiles. In the case
although symmetry can be exploited in order to save texture
of a cubic filter kernel for bicubic filtering, sixteen passes need to
memory for the filter tile textures. It is also not limited to
be performed on two-texture multi-texturing hardware.
separable filter kernels (in two and three dimensions,
respectively), as will be shown in the following sections. As mentioned in the previous section, our approach is neither
limited to symmetric nor separable filters. However, non-
1
In OpenGL, via a clamp mode of GL REPEAT separable
with two 3D textures, but of course uses a lot more polygons. In
the multi-texture approach, just a single polygon has to be drawn
in each pass. If the second 3D texture is not available, several
thou- sand polygons have to be drawn instead. If, however, a 2D
texture is available concurrently with the 3D texture, it should be
possible to adapt the approach for rendering oblique slices with
trilinear in- terpolation described by Rezk-Salama et al. [19] to our
method. In this case, instead of clipping the slice against all
voxels, clipping would only need to be done against a set of planes
orthogonal to a single axis. This would reduce the number of
polygons needed to the resolution of the volume along that axis
Figure 5: Bicubic B-spline filter kernel; filter tiles separated by (e.g., 128).
white lines.
Volume rendering
filter kernels may need additional passes if they contain both pos-
itive and negative areas, see section 4.4. Apart from this, sepa- Since we are able to reconstruct axis-aligned slices, as well as
rable as well as non-separable filter kernels are simply stored in arbi- trarily oriented slices, with our high-quality filtering
two-dimensional filter tile textures. approach, our technique can also be used for direct volume
rendering by using one of the two major approaches of DVR
exploiting texture mapping hardware. The first possibility is to
Reconstruction of oblique slices render the volume by blending a stack of object-aligned slices on
When planar slices through 3D volumetric data are allowed to be lo- top of each other. Each one of these slices would be reconstructed
cated and oriented arbitrarily, three-dimensional filtering has to be with the high-quality 2D filter- ing approach outlined in section
performed although the result is still two-dimensional. On 3.3. All of these slices would then be blended via back-to-front
graphics hardware, this is usually done by trilinearly interpolating compositing. The second possibility is to render the volume by
within a 3D texture. Our method can also be applied in this case blending a stack of viewport-aligned slices on top of each other,
in order to improve reconstruction quality considerably. The each one of them individually reconstructed with the 3D filtering
conceptually straightforward extension of the 2D approach approach of section 3.4. Analogously to the standard approach
described in the pre- vious section, simultaneously using two 2D using 3D texture mapping hardware and trilinear interpolation, our
textures, achieves the equivalent for three-dimensional method in this case requires the capability to use 3D textures, but
reconstruction by simultaneously using two 3D textures. The first achieves higher reconstruction quality, since the individual slices
3D texture contains the input vol- ume in its entirety, whereas the can be reconstructed with high-order filter kernels. Furthermore,
second 3D texture contains the cur- rent filter tile, which in this our approach can also be used to reconstruct gradi- ents in high
case is a three-dimensional unit cube. In the case of a cubic filter quality, in addition to reconstructing density values. This is
kernel for tricubic filtering, 64 passes need to be performed on possible in combination with hardware-accelerated methods that
two-texture multi-texturing hardware. If such a kernel is store gradients in the RGB components of a texture [9, 23].
symmetric we download eight 3D textures for the filter tiles,
generating the remaining 56 without any performance loss by
mirroring texture coordinates. Due to the high memory consump- 4 Commodity graphics hardware issues
tion of 3D textures, it is especially important that the filter kernel
need not be downloaded to the graphics hardware in its entirety if This section discusses features and issues of current low-cost 3D
it is symmetric. Texture compression can also be used in order to min- graphics accelerators that are required or can be exploited by
imize on-board texture memory consumption. Various compression our method. We are specifically targeting widely available con-
schemes are available for this purpose [16]. sumer graphics hardware on the PC platform, like the NVIDIA
Unfortunately, current hardware–especially the ATI Radeon– GeForce [15], and the ATI Radeon [2]. The graphics API we are
does not support multi-texturing with two 3D textures. Although using in our work is OpenGL [17].
the Radeon has three texture units, the most it can do is multi-
texturing with one 3D texture and one 2D texture at the same time.
Since we need data from two 3D textures available Single-pass multi-texturing
simultaneously, the missing functionality has to be emulated in A very important feature of today’s graphics hardware is the capa-
this case. Concep- tually, this is quite simple. It incurs a major bility to texture-map a single polygon with more than one texture
performance penalty, however, and thus has only the character of a at the same time2. Usually, the process of multi-texturing is ac-
proof-of-concept. As- suming that only a single 3D texture is cessed by programming several texture stages, the output of each
available, we provide the second input (after the filter kernel) to stage becoming the input to the next. More recent functionality3
the filter convolution sum as color of flat-shaded polygons. This adds a lot more flexibility to this model, which is especially true
is possible since the input volume need only be point-sampled. We for NVIDIA’s extensions for register combiners4 and texture
generate these polygons on-the-fly by intersecting the voxel grid shaders5, the latter only having been introduced with the GeForce
of the input volume with the plane of the slice we want to 3 [16]. The practically de-facto standard on current graphics
reconstruct. Each clipped part of the slice that is entirely contained hardware is two-texture multi-texturing, where two textures can be
within a single voxel is rendered as a polygon of a single color, the used simul- taneously. However, recent hardware like the ATI
color being manually retrieved from the input volume and Radeon and the GeForce 3 are already able to use three and four
assigned to the polygon. At the same time, the current filter tile is textures, respec- tively. This can be exploited in order to reduce
activated as a 3D texture filtered via the number of ren- dering passes required.
trilinear interpolation. For each output pixel and pass, the color
of the polygon (which would normally be fetched from the second 2
exposed via ARB multitexture in OpenGL
3D texture that is not available) is multiplied by the kernel value 3
EXT texture env combine, NV texture env combine4 etc.
4
fetched from the 3D texture and stored into the frame buffer. This NV register combiners, NV register combiners2
approach ultimately yields the exact same result as multi-texturing 5
NV texture shader, NV texture shader2
1.2 1.2
Cubic B-spline Blackman windowed sinc Blackman window (width = 4)
Catmull-Rom spline
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-0.2
-0.2
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
(a) (b)
Figure 6: Filter kernels we have used: (a) Cubic B-spline and Catmull-Rom spline; (b) Blackman windowed sinc depicting also the window
itself.
have also entered the realm of consumer graphics hardware, e.g., buffers is that negative numbers cannot be represented directly.
with the ATI Radeon. 3D textures have a number of uses, ranging Usually, this limited range permeates the entire graphics pipeline,
from solid texturing to direct volume rendering. Analogously to but even on the latest hardware like the NVIDIA GeForce 3, that
the standard bilinear interpolation performed by 2D texture offers texture formats with signed values 6, the result of a single
mapping hardware, the three-dimensional counterpart uses pass is clamped to [0 1℄ before it is combined with the contents of
;
trilinear interpo- lation within the texture volume in order to the frame buffer. Applications that need to subtract from the frame
reconstruct the texture at the desired locations. Direct volume buffer cannot do this by simply using negative numbers.
rendering is easily able to exploit 3D texture mapping capabilities In OpenGL, at least explicit subtraction is supported in princi-
by blending a stack of 2D slices on top of each other [23]. These ple7. Unfortunately, not all graphics accelerators support this func-
slices are usually aligned with the viewport and therefore have an tionality. Even if explicit subtraction is supported by the hardware
arbitrary location and ori- entation with respect to the texture and exported by the API, it is not possible to switch between ad-
domain itself. Although 3D textures are not yet widely supported dition and subtraction on a per-pixel basis, i.e., the behavior that
in the consumer marketplace, they will very likely become a could naturally be achieved by using signed numbers. Frame
standard feature in the immediate fu- ture. The method proposed in buffer subtraction is a part of the imaging subset of OpenGL 1.2.
this paper employs three-dimensional texture mapping capabilities The en- tire subset is usually only supported in software. Thus,
for reconstructing arbitrarily oriented slices through volumetric NVIDIA GeForce cards, for example, offer this capability8
data. separately.
Due to the [0 1℄ range of the frame buffer, care also has to be
;
Figure 7: Slice of MR head data set, reconstructed using: (a) Bilinear interpolation; (b) Blackman windowed sinc; (c) Bicubic Catmull-Rom
spline; (d) Bicubic B-spline.
Table 1: Frame rates for different scenarios. Note that the tricubic case uses the fall-back algorithm.
5 Results screws. The slice is not aligned with the slice stack, but instead
has an arbitrary location and orientation with respect to the texture
In our work, we are focusing on widely available low-cost PC domain. Since the Radeon does not support multi-texturing with
graphics hardware. Currently, the two premier graphics accelera- two 3D textures, we have used the fall-back approach for this case
tors in this field are the NVIDIA GeForce 2 and the ATI Radeon. outlined in section 3.4.
We have implemented and tested our method on a GeForce 2 with
Table 1 shows some timing results of our test implementation.
32MB RAM, and a Radeon with 64MB RAM. The graphics API
We have used a Pentium III, 733 MHz, 512MB of RAM, with the
we are using is OpenGL. Although the GeForce 3 is already on the
two graphics cards described above (GeForce 2 and Radeon). Note
horizon, it is not widely available yet.
that the timings do not depend on the shape of the filter kernel per
The GeForce 2 supports multi-texturing with two 2D textures at se, only on its width.
the same time and offers the capability to subtract from the
contents of the frame buffer. It does not support 3D textures,
however. The Radeon supports up to three simultaneous 2D
textures, as well as one 3D texture and one 2D texture at the same
6 Conclusions and future work
time. Unfortunately, it does not allow to subtract from the frame
We have presented a general approach for high-quality filtering that
buffer. These properties of the graphics hardware we have used
is able to exploit hardware acceleration for reconstruction with ar-
lead to a couple of restric- tions with respect to what parts of our
bitrary filter kernels. Conceptually, the method is not constrained
approach can be implemented on what platform.
to a certain dimensionality of the data, or the shape of the filter
As filter kernels we have used a cubic B-spline of width four kernel. In practice, limiting factors are the number of rendering
and a cubic Catmull-Rom spline, also of width four. In addition to passes and the precision of the frame buffer.
these we have also tested a windowed sinc filter kernel.
Our method is quite general and can be used for a lot of appli-
We have reconstructed object-aligned planar slices on both the cations. With regard to volume visualization, the reconstruction of
GeForce 2, as well as the Radeon. Figure 7(a) shows a part from object-aligned, as well as oblique slices through volumetric data is
a 256x256 slice from a human head MR data set. The image was especially interesting. Reconstruction of slices can also be used
rendered in a resolution of 600x600. In this figure, reconstruction for direct volume rendering.
was done with the hardware-native bilinear interpolation. Thus,
We are exploiting commodity graphics hardware, multi-
interpolation artifacts are clearly visible when viewed on screen.
texturing, and multiple rendering passes. The number of passes
The same slice with the same conditions but different reconstruc-
is a major factor determining the resulting performance.
tion kernels is shown in figures 7(b) (Blackman windowed sinc),
Therefore, future hardware that supports many textures at the
7(c) (cubic Catmull-Rom), and 7(d) (cubic B-spline). The Black-
same time–not only 2D textures, but also 3D textures–will make
man windowed sinc and Catmull-Rom kernels can only be used on
the application of our method more feasible for real-time use.
the GeForce, due to the fact that the Radeon does not support
Since hardware that is able to combine multi-texturing with 3D
frame buffer subtraction. Figure 9 (colorplate) shows slices
reconstructed with different filters together with magnified regions textures is not yet available to us, we are still using a “simula-
to highlight the differences. tion” of our algorithm for reconstructing oblique slices that is
much slower. Thus, we are really looking forward to the
Since the GeForce 2 does not support 3D textures, we have tested
immediate fu- ture where such hardware will be available. Still,
our approach for reconstructing arbitrarily oriented slices only on
we would also like to adapt the approach outlined by Rezk-Salama
the Radeon. Figure 8 shows a slice from a vertebrae data set with
et al. [19] in or- der to considerably speed up the intermediate
solution. We would
(a) (b) (c) (d)
Figure 8: Oblique slice through vertebrae data set with screws: (a) Trilinear filter; (b) Part of (a) magnified; (c) Tricubic B-spline filter; (d)
Part of (c) magnified.
[4] T. J. Cullip and U. Neumann. Accelerating volume recon- [15] NVIDIA web page. http://www.nvidia.com/.
struction with 3D texture mapping hardware. Technical Re- [16] NVIDIA OpenGL extension specifications document.
port TR93-027, Department of Computer Science, http://www.nvidia.com/developer.
University of North Carolina, Chapel Hill, 1993.
[17] OpenGL web page. http://www.opengl.org/.
[5] F. Dachille, K. Kreeger, B. Chen, I. Bittner, and A.
Kaufman. High-quality volume rendering using texture [18] A. V. Oppenheim and R. W. Schafer. Digital Signal
mapping hard- ware. In Proceedings of Process- ing. Prentice Hall, Englewood Cliffs, 1975.
Eurographics/SIGGRAPH Graphics Hardware Workshop
1998, 1998. [19] C. Rezk-Salama, K. Engel, M. Bauer, G. Greiner, and T.
Ertl. Interactive volume rendering on standard PC graphics
hard- ware using multi-textures and multi-stage rasterization.
[6] M. Hopf and T. Ertl. Accelerating 3D convolution using
In Proceedings of Eurographics/SIGGRAPH Graphics
graphics hardware. In Proceedings of IEEE Visualization
Hard- ware Workshop 2000, 2000.
’99, pages 471–474, 1999.
[20] Silicon Graphics, Inc. Pixel textures extension. Specification
[7] R. G. Keys. Cubic convolution interpolation for digital im- available from http://www.opengl.org, 1996.
age processing. IEEE Trans. Acoustics, Speech, and Signal
Processing, ASSP-29(6):1153–1160, December 1981. [21] T. Theußl, H. Hauser, and M. E. Gro¨ller. Mastering
windows: Improving reconstruction. In Proceedings of IEEE
[8] S. R. Marschner and R. J. Lobb. An evaluation of Sympo- sium on Volume Visualization, pages 101–108, 2000.
reconstruc- tion filters for volume rendering. In Proceedings
of IEEE Vi- sualization ’94, pages 100–107, 1994. [22] K. Turkowski. Filters for common resampling tasks. In An-
drew S. Glassner, editor, Graphics Gems I, pages 147–165.
[9] M. Meißner, U. Hoffmann, and W. Straßer. Enabling clas- Academic Press, 1990.
sification and shading for 3D texture mapping based volume [23] R. Westermann and T. Ertl. Efficiently using graphics hard-
rendering. In Proceedings of IEEE Visualization ’99, pages ware in volume rendering applications. In Proceedings of
207–214, 1999. SIGGRAPH ’98, pages 169–178, 1998.
(a)
(b)
(c)
(d)
Figure 9: Slice from MR data set, reconstructed using different filters (right images show shaded portion of left image enlarged): (a)
Bilinear filter; (b) Blackman windowed sinc; (c) Bicubic Catmull-Rom spline; (d) Bicubic B-spline.