Cvejic
Cvejic
Cvejic
Abstract— In this paper, we present a novel objective non- of a clearly defined ground-truth. Various fusion algorithms
reference performance assessment algorithm for image fusion. It takes presented in the literature [5] have been evaluated objectively
into account local measurements to estimate how well the important by constructing an ”ideal” fused image and using it as a
information in the source images is represented by the fused image.
The metric is based on the Universal Image Quality Index and uses reference for comparison with the experimental results [6],
the similarity between blocks of pixels in the input images and the [7]. Mean squared error (MSE) based metrics were widely
fused image as the weighting factors for the metrics. Experimental used for these comparisons. Several objective performance
results confirm that the values of the proposed metrics correlate well measures for image fusion have been proposed where the
with the subjective quality of the fused images, giving a significant knowledge of ground-truth is not assumed. In [8], authors
improvement over standard measures based on mean squared error
and mutual information. used the mutual information as a parameter for evaluation of
the fusion performance. Xydeas and Petrovic [9] proposed a
Keywords—Fusion performance measures, image fusion, non- metric that evaluates the relative amount of edge information
reference quality measures, objective quality measures.
that is transferred from the input images to the fused image.
In this paper, we present a novel objective non-reference
I. I NTRODUCTION quality assessment algorithm for image fusion. It takes into
account local measurements to estimate how well the impor-
I MAGE and video fusion is emerging as a vital technology
in many military, surveillance and medical applications. It
is a subarea of the more general topic of data fusion, dealing
tant information in the source images is represented by the
fused image, while minimizing the number of artefacts or the
amount of distortion that could interfere with interpretation.
with image and video data [1], [2]. The ability to combine
Our quality measures are based on an image quality index
complementary information from a range of distributed sensors
proposed by Wang and Bovik [10].
with different modalities can be used to provide enhanced
performance for visualization, detection or classification tasks.
II. D EFINITION OF THE U NIVERSAL I MAGE Q UALITY
Multi-sensor data often present complementary information
I NDEX
about the scene or object of interest, and thus image fusion
provides an effective method for comparison and analysis of The measure that was used as the basis for our objective
such data. There are several benefits of multi-sensor image performance evaluation of image fusion is the Universal Image
fusion: wider spatial and temporal coverage, extended range Quality Index (UIQI) [10]. The authors compared the proposed
of operation, decreased uncertainty, improved reliability and quality index to the standard MSE objective quality measure
increased robustness of the system performance. and the main conclusion was that their new index outperforms
In several application scenarios, image fusion is only an the MSE, due to the UIQI’s ability in measuring structural
introductory stage to another task, e.g. human monitoring. distortions [10].
Therefore, the performance of the fusion algorithm must be Let X = xi |i = 1, 2, ..., N and Y = yi |i = 1, 2, ..., N
measured in terms of improvement in the following tasks. be the original and the test image signals, respectively. The
For example, in classification systems, the common evaluation proposed quality index is defined as [10]:
measure is the number of the correct classifications. This sys- 4σxy xy
Q= 2 (1)
tem evaluation requires that the ”true” correct classifications (σx + σy2 ) · [(x)2 + (x)2 ]
are known. However, in experimental setups the ground-truth
where
data might not be available. N N
1 1
In many applications the human perception of the fused x= xi , y= yi (2)
image is of fundamental importance and as a result the fusion N i=1 N i=1
results are mostly evaluated by subjective criteria [3], [4]. N N
1 1
Objective image fusion performance evaluation is a tedious σx2 = (xi − x)2 , σy2 = (yi − y)2 (3)
task due to different application requirements and the lack N − 1 i=1 N − 1 i=1
N
Authors are with the Centre for Communications Research, University of 1
Bristol, Merchant Venturers Building, Woodland Road, Bristol BS8 1UB, σxy = (xi − x)(yi − y) (4)
United Kingdom. This work has been funded by the UK Ministry of Defence
N − 1 i=1
Data and Information Fusion Defence Technology Centre.
Corresponding author: Nedeljko Cvejic, phone: +44 117 331 5102; fax: The dynamic range of Q is [−1, 1]. The best value 1 is
+44 117 954 5206, e-mail: n.cvejic@bristol.ac.uk. achieved if and only if yi = xi for all i = 1, 2, ..., N .
178
International Journal of Signal Processing Volume 2 Number 3
The lowest value of -1 occurs when yi = 2x − xi for all should reflect the relative importance of image X compared
i = 1, 2, ..., N . This quality index models image distortions to image Y within the window w. s(X|w) denotes saliency
as a combination of three different factors: loss of correlation, of image X in window w. It should reflect the local relevance
luminance distortion and contrast distortion. In order to make of image X within the window w, and it may depend on e.g.
this more understandable, the definition of Q can be rewritten contrast, sharpness, or entropy. As with the previous metrics,
as a product of three components: this metric does not require a ground-truth or reference image.
Finally, to take into account some aspect of the human visual
σxy 2xy 2σx σy
Q= (5) system (HVS) which is the relevance of edge information, the
σx σy (x) + (y) σx2 + σy2
2 2
same measure is computed with the ”edge images” (X ′ , Y ′
The first component is the correlation coefficient between X and F ′ ) instead of the grey-scale images X, Y and F .
and Y and its dynamic range is [−1, 1]. The best value 1 is QE (X, Y, F ) = Qp (X, Y, F )1−α Qp (X ′ , Y ′ , F ′ )α (9)
obtained when yi = axi + b for all i = 1, 2, ..., N , where a
and b are constants and a > 0. Even if X and Y are linearly III. P ROPOSED I MAGE F USION P ERFORMANCE M ETRICS
related, there still might be relative distortions between them
In the computation of Piella’s metric parameter λ in equa-
and these are evaluated in the second and third component.
tion (2.6) is computed with s(X|w) and s(Y |w) being the
The second component with a value range of [0, 1] measures
variance (or the average in the edge images) of images X and
how close the mean luminance is between X and Y . It equals
Y within window w, respectively. Therefore, there is no clear
1 if and only if x = y. σx and σy can be viewed as an estimate
measure of how similar each input image is to the final fused
of the contrast of X and Y , so the third component measures
image. Each time the metric is calculated, an ”edge image” has
how similar the contrasts of the images are. The range of
to be derived from the input images, which adds significantly
values for the third component is also [0,1], where the best
to the computational complexity of the metric. In addition,
value 1 is achieved if and only if σx = σy .
the metrics calculated and presented in [11] are only for one
Since images are generally non-stationary signals, it is ap-
window size (8x8). The window size has a significant influence
propriate to measure Q0 over local regions and then combine
on this fusion performance measure, as the main weighting
the different results into a single measure Q. In [10] the
factor is the ratio of the variances of the input images which
authors propose to use a sliding window: starting from the
tend to vary significantly with the window size.
top-left corner of the two images X, Y , a sliding window of
We propose a novel fusion performance measure that takes
a fixed size block by block over the entire image until the
into account the similarity between the input image block and
bottom-right corner is reached. For each window w the local
the fused image block within the same spatial position. It is
quality index Q0 (X, Y |w) is computed for the pixels within
defined as:
the sliding window w. Finally, the overall image quality index
Q is computed by averaging all local quality indices: Qb = sim(X, Y, F |w)Q(X, F |w)+(1−sim(X, Y, F |w))Q(Y, F |w)
w∈W
1 (10)
Q(x, y) = Q0 (a, b|w) (6)
|W | = sim(X, Y, F |w)(Q(X, F |w)−Q(Y, F |w))+Q(Y, F |w)
w∈W
w∈W
(11)
where W is the family of all windows and |W | is the cardinal-
where X and Y are the input images, F is the fused image,
ity of W . Wang and Bovik [10] have compared (under several
w is the analysis window and W is the family of all windows.
types of distortions) their quality index with existing image
We define sim(X, Y, F |w) as:
measures such as MSE as well as with subjective evaluations. ⎧ σxf
The tested images were distorted by: additive white Gaussian ⎪
⎨ 0 if σxf +σ yz
<0
noise, blurring, contrast stretching, JPEG compression, salt σxf σxf
sim(X, Y, F |w) = σxf +σyz if 0 ≤ σxf +σyz ≤ 1
and pepper noise, mean shift and multiplicative noise. The ⎪
⎩ 1
σxf
if σxf +σ >1
yz
main conclusion was that UIQI outperforms the MSE, which (12)
due to the index’s ability of measuring structural distortions, where
in contrast to the MSE which is highly sensitive to the energy 1
N
of errors. σuv = (ui − u)(vi − v) (13)
N − 1 i=1
In order to apply the UIQI for image fusion evaluation,
Piella and Heijmans [11] introduce salient information to the Each analysis window is weighted by the sim(X, Y, F |w)
metric: that is dependent on the similarity in spatial domain between
the input image and the fused image. The image block from
Qp (X, Y, F ) = c(w)[λQ(X, F |w) + (1 − λ)Q(Y, F |w)] two of the input images that is more similar to the fused
w∈W
(7) image block is assigned a larger weighting factor used for
where X and Y are the input images, F is the fused image, calculation of the fusion performance metric. The impact
c(w) is the overall saliency of a window and λ is defined as: of the less similar block is accordingly decreased. In this
sense, we are able to measure more accurately the fusion
s(X|w) performance, especially in an experimental setup where the
λ= (8)
s(X|w) + s(Y |w) input images are distorted versions of the ground-truth data;
179
International Journal of Signal Processing Volume 2 Number 3
obtained by e.g. blurring, JPEG compression, noise addition, a Sobel edge operator to calculate the strength g(n, m) and
mean shift, etc. The sim(X, Y, F |w) function is designed orientation α(n, m) information of each pixel in the input and
to have the upper limit at one, so that impact of the less output images. The relative strength and orientation ”change”
significant block is completely eliminated when the other values, GAF (n, m) and AAF (n, m), respectively, of an input
input block similarity measure equals one. Calculation of the image A with respect to the fused one F are defined as:
sim(X, Y, F |w) function is computationally significantly less g (n,m)
F
gA (n,m) if gA (n, m) > gF (n, m)
demanding, compared to the metrics proposed in [8] and [11]. AF
G (n, m) = gA (n,m)
gF (n,m) otherwise
(14)
IV. E XPERIMENTAL R ESULTS ||αA (n, m) − αF (n, m)| − π/2|
AF
A (n, m) = (15)
In this section we test the proposed fusion quality measure π/2
in (3.1) to evaluate several multiresolution (MR) image fusion These measures are then used to estimate the edge strength and
algorithms and compare it to standard objective image metrics. orientation preservation values, QAF AF
g (n, m) and Qα (n, m):
The MR-based image fusion approach consists of performing
an MR transform on each input image and, following specific Γg
QAF
g (n, m) = AF (16)
rules, combining them into a composite MR representation. 1 + ekg (G (n,m)−σg )
The composite image is obtained by applying the inverse Γα
QAF
α (n, m) = (17)
transform on this composite MR representation [2]. 1 + ekα (AAF (n,m)−σα )
During the tests we use the simple averaging method, the where the constants Γg , kg , σg and Γα , kα , σα determine the
ratio pyramid, Principal Component Analysis (PCA) method exact shape of the sigmoid nonlinearities used to form the
and the discrete wavelet transform (DWT), and in all MR cases edge strength and orientation. The overall edge information
we perform 5-level decomposition. We perform the fusion of preservation values are then defined as:
the coefficients of the MR decomposition of each input image
by selecting at each position the coefficient with a maximum QAF (n, m) = QAF AF
g (n, m) · Qα (n, m), 0 ≤ Q
AF
(n, m) ≤ 1
absolute value, except for the coefficients from the lowest (18)
resolution where the fused coefficient equals to the mean value Having QAF (n, m) and QBF (n, m) a normalised weighted
of the coefficients in that subband. performance metric of a given process p that fuses A and B
The first pair of test images used is the complementary pair into F is given as:
shown in the top row of Fig. 1. The test images have been N M
QAF (n, m)wA (n, m) + QBF (n, m)wB (n, m)
n=1 m=1
created artificially by blurring the original ”Goldhill” image Qp = N M
wA (n, m) + wB (n, m)
of size 512x512, using Gaussian blurring with a radius of 10 n=1 m=1
(19)
pixels. The images are complementary in the sense that the The edge preservation values QAF (n, m) and QBF (n, m)
blurring takes place at the complimentary horizontal strips in are weighted by coefficients wa (n, m) and wb (n, m), which
the first and the second image, respectively. The fused images reflect the perceptual importance of the corresponding edge
obtained by the average method, the ratio pyramid, the PCA elements within the input images. Note that in this method,
method and DWT domain fusion are depicted in the first the visual information is associated with the edge information
and the second row, from left to right. Table I compares the while the region information is ignored. This metric will be
quality of these composite images using our proposed quality referred to as ”Petrovic” metric in the rest of the paper.
measures. The first three rows correspond to the proposed Mutual information has emerged as an alternative to PSNR.
fusion quality measure, as defined in (3.1). The proposed It measures the degree of dependence of the two random
metrics are calculated for three window sizes: 4x4, 8x8 and variables A and B. It is defined by Kullback-Leibler measure:
16x16 pixels, in order to examine the dependence of the
metric’s output values versus the analysis window size.
pAB (a, b)
IAB (a, b) = pAB (a, b) · log (20)
For comparison, we also compute the PSNR between the x,y
pA (a) · pB (b)
original ’Goldhill’ image and each of the generated fused
where pAB (a, b) is the joint distribution and pA (a) · pB (b) is
images. In ’real life’ image fusion scenarios we do not have
the distribution associated with the case of complete indepen-
access to the original image, so the PSNR value is provided
dence. Considering two input images A, B and a new fused
just as a reference. In addition, we have provided as references
image F , the amount of information that F contains about A
the fusion performance metric developed by Petrovic and
and B can be calculated as:
Xydeas [8] (given in the fourth row of the Table I-III) and
the metric based on mutual information [9] (the fifth row of
pF A (f, a)
IF A (f, a) = pF A (f, a) · log (21)
the Table I-III). x,y
p F (f ) · pA (a)
The fusion metric proposed Petrovic and Xydeas [8], is
obtained by evaluating the relative amount of edge information
pF B (f, b)
IF B (f, b) = pF B (f, b) · log (22)
transferred from the input images to the output image. It also x,y
pF (f ) · pB (b)
takes into account the relative perceptual importance of the
and the image fusion performance measure can be defined as:
visual information found in the input images, by assigning
perceptual importance weights to more salient edges. It uses MFAB = IF A (f, a) + IF B (f, b) (23)
180
International Journal of Signal Processing Volume 2 Number 3
TABLE I
C OMPARISON OF D IFFERENT O BJECTIVE Q UALITY M EASURES FOR THE
C OMPOSITE I MAGES IN F IG . 1
TABLE II
C OMPARISON OF D IFFERENT O BJECTIVE Q UALITY M EASURES FOR THE
C OMPOSITE I MAGES IN F IG . 2
TABLE III
C OMPARISON OF D IFFERENT O BJECTIVE Q UALITY M EASURES FOR THE
C OMPOSITE I MAGES IN F IG . 3
I-III. Note that the proposed metric has very similar quality
The following two pairs of input images are contaminated measures as the Petrovic’s metric and that these two metrics
by or Gaussian additive noise (Fig. 2) and Salt and Pepper considerably outperform the MI measure and PSNR. It is
(SP) noise (Fig. 3). Although the additive noise can be tackled clear from the experiments that MI metric and PSNR often
by performing hard thresholding of the parameters in the assign the highest value of the fusion performance measure
transform domain and SP noise by median filtering we did to the algorithm that does not perform well in the subjective
not perform denoising in order to get more balanced data terms. The values obtained from the proposed metrics correlate
for the proposed metric. The results for the noisy input well to the subjective quality of the fused images, which
images are given in the Table II and Table III for the image was not achievable by the standard MI fusion performance
distorted by Gaussian additive noise and SP noise, respectively. measure and PSNR. In addition, the proposed metrics is not
Test results show that the DWT domain fusion visually significantly dependent on the size of the analysis window
outperform the other three schemes. It is most noticeable as the difference in fusion performance does not change
as, for instance, the blurring (e.g., edges in the background extensively with the variation of window size.
and small details) and the loss of texture in the fused image
obtained by the ratio pyramid and averaging. Furthermore, V. C ONCLUSIONS
in the ratio-pyramid method fused image, some details of We present a novel objective non-reference performance
the images and background have been completely lost, and assessment algorithm for image fusion. It takes into account
in the average composite image, the loss of contrast is very local measurements to estimate how well the important in-
evident. These subjective visual comparisons agree with by formation in the source images is represented by the fused
the results obtained by the proposed metric, presented in Table image. Experimental results confirm that the values of the
181
International Journal of Signal Processing Volume 2 Number 3
Fig. 2. Fusion results, the original image randomly blurred, with Gaussian Fig. 3. Fusion results, the original image randomly blurred, with Salt and
noise added. Top row: input image X (left), input image Y (right). Second Pepper noise added. Top row: input image X (left), input image Y (right).
row: fused image F using averaging (left), fused image F using ratio Second row: fused image F using averaging (left), fused image F using ratio
pyramid decomposition (right). Bottom row: fused image F using the PCA pyramid decomposition (right). Bottom row: fused image F using the PCA
decomposition (left), fused image F using DWT domain fusion (right) decomposition (left), fused image F using DWT domain fusion (right)
proposed metrics correlate well with the subjective quality [5] G. Piella, ”A general framework for multiresolution image fusion: from
of the fused images, giving a significant improvement over pixels to regions”, Information Fusion, Vol. 4, No. 2003, 2003, pp. 259-
280.
standard measures based on mean squared error and mutual [6] H. Li and B. S. Manjunath and S. K. Mitra, ”Multisensor image fusion
information. Compared to already presented fusion perfor- using the wavelet transform”, Graphical Models and Image Processing,
mance measures [8], [11], it obtains comparable results with Vol. 57, No. 3, 1995, pp. 235–245.
[7] O. Rockinger, ”Image sequence fusion using a shift invariant wavelet
considerably decreased computational complexity. transform”, Proc. IEEE International Conference on Image Processing,
Further research will focus on how to select the salient Washington, DC, 1997, pp.288–291.
[8] C. Xydeas and V. Petrovic, ”Objective pixel-level image fusion perfor-
points in order to optimize the fusion performance. Another mance measure”, Proc. SPIE, Orlando, FL, 2000, pp. 88–99.
extension of the work will be performance measure based on [9] G. H. Qu and D. L. Zhang and P. F. Yan, ”Information measure for
regions of the image, obtained by segmentation of the input performance of image fusion”, Electronics Letters, Vol. 38, No. 7, 2002,
pp. 313–315.
images, rather than calculating the measure in square windows. [10] Z. Wang and A. C. Bovik, ”A universal image quality index”, IEEE
Signal Processing Letters, Vol. 9, No. 3, 2002, pp. 81–84.
[11] G. Piella and H. Heijmans, ”A new quality metric for image fusion”,
R EFERENCES Proc. IEEE International Conference on Image Processing, Barcelona,
Spain, 2003, pp. 173–176
[1] H. Maitre and I. Bloch, ”Image fusion”, Vistas in Astronomy, Vol. 41,
No. 43, 1997, pp. 329–335.
[2] S. Nikolov and P. Hill and D. Bull and N. Canagarajah, Wavelets for
image fusion, in ”Wavelets in Signal and Image Analysis”, Kluwer,
Dordrecht, The Netherlands, 2001.
[3] D. Ryan and R. Tinkler, ”Night pilotage assessment of image fusion”,
Proc. SPIE, Orlando, FL, 1995, pp. 50–67.
[4] A. Toet and E. M. Franken ”Perceptual evaluation of different image
fusion schemes”, Displays, Vol. 24, No. 1, 2003, pp. 25–37.
182