Kohler 2013

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Automatic No-Reference Quality Assessment for Retinal Fundus Images Using

Vessel Segmentation

Thomas Kohler1,2 , Attila Budai1,2 , Martin F. Kraus1,2 , Jan Odstrcilik4,5 ,


Georg Michelson2,3 , Joachim Hornegger1,2
1
Pattern Recognition Lab, University of Erlangen-Nuremberg, Erlangen, Germany
2
Erlangen Graduate School in Advanced Optical Technologies (SAOT), Erlangen, Germany
3
Department of Ophthalmology, University of Erlangen-Nuremberg, Erlangen, Germany
4
Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
5
St. Annes University Hospital - International Clinical Research Center (ICRC), Brno, Czech Republic
thomas.koehler@fau.de

Abstract cently established in ophthalmic imaging deal with images


of different quality that must be quantitatively assessed to
Fundus imaging is the most commonly used modality to detect most the most reliable image (e. g. for autofocus) or
collect information about the human eye background. Ob- to evaluate image improvement (e. g. in deconvolution).
jective and quantitative assessment of quality for the ac- No-reference image quality assessment deals with the
quired images is essential for manual, computer-aided and problem to provide a quantitative score of image quality in
fully automatic diagnosis. In this paper, we present a no- the absence of a gold standard. In literature, there exist two
reference quality metric to quantify image noise and blur groups of methods for solving this task: (i) classification-
and its application to fundus image quality assessment. The based approaches and (ii) quality metrics for image content.
proposed metric takes the vessel tree visible on the retina as In methods falling in category (i), image quality is predicted
guidance to determine an image quality score. In our exper- by assigning an image to one class out of a discrete set of
iments, the performance of this approach is demonstrated quality classes using supervised learning strategies. This
by correlation analysis with the established full-reference is achieved using feature extraction and classification based
metrics peak-signal-to-noise ratio (PSNR) and structural on a gold standard provided by experts [8, 9]. Even if such
similarity (SSIM). We found a Spearman rank correlation methods are attractive for identifying good images for diag-
for PSNR and SSIM of 0.89 and 0.91. For real data, our nosis purposes, the application is limited to problems where
metric correlates reasonable to a human observer, indicat- a discrete assessment is sufficient. (ii) Quality metrics are
ing high agreement to human visual perception. scores for general quality features like image noise or sharp-
ness to provide a continuous measure in an unsupervised
manner, which is the focus in this paper. For the predic-
1 Introduction tion of the relative amount of sharpness in natural images
Narvekar and Karam [7] proposed the cumulative probabil-
Fundus imaging is the most commonly used modality to ity of blur detection (CPBD). A quality metric to estimate
collect information about the human eye background for the image noise and blur simultaneously is Renyi entropy [3]
diagnosis of various retinal diseases such as glaucoma or di- adopted to fundus imaging by Marrugo et al. [4]. A major
abetic retinopathy. Retinal image analysis is an active field limitation of these methods is that a uniform quality across
of research providing image processing and machine learn- the whole image is assumed which is not always valid in
ing methods either for computer-assisted or fully automatic case of fundus images. This is caused by the curvature of
diagnoses [1]. However, for the success of these methods the retina or diseases which introduce local blur. Zhu et
high-quality image data is essential. Images of poor quality al. [12] proposed a novel metric to quantify noise and blur
must be detected by an operator and the acquisition must be which does not require uniform disturbances.
repeated, which is a highly subjective decision and a time- In this work, we focus on simultaneous quantification of
consuming task. Furthermore, image processing techniques image blur and noise. Here, we adopt the approach orig-
ranging from autofocus [6] to image deconvolution [5] re- inally introduced by Zhu et al. [12] for automatic qual-

c
978-1-4799-1053-3/13/$31.00 2013 IEEE
CBMS 2013 95
ity assessment of retinal fundus images. In section 2 we
present the state-of-the-art approach applicable to each im-
age modality. Our specialized approach for fundus images
is introduced in section 3 and takes the vessel tree as guid-
ance to determine an objective and continuous quality score.
The performance is demonstrated by analyzing the correla-
tion between no-reference assessment and established full-
reference metrics in section 4. This method may be used
either as stand-alone metric to detect blurred and noisy im-
ages or as feature in a classification-based approach.

(a) (b)
2 Background
Figure 1: A color fundus image (a) and the detected
Let I be a grayscale image of size M N . We decom- anisotropic patches of size 8 8 pixels (b).
pose I in a set of distinct patches, whereas each patch P is
of size n n. The local gradient matrix G of size n2 2
for P is given by: threshold R calculated by:
v
Px (1, 1) Py (1, 1) u
.. .. u 1 n211
G= . . , (1) R = t 1 , (7)
Px (n, n) Py (n, n) 1 + n2 1

where Px (xi , yi ) and Py (xi , yi ) denotes the derivative of where is the significance level for testing if a given patch
P at pixel (xi , yi ) in x- and y-direction, respectively. The is anisotropic. For the patch size we set n = 8 and for the
singular value decomposition (SVD) of G is given by: significance level = 0.001 as suggested in [12].

G = U DV T (2)
  3 Proposed Method
s 0
= U 1 V T, (3)
0 s2
In this section, we describe, how the no-reference quality
for orthogonal matrices U and V and the singular values s1 , metric Q is applied to color fundus images. Next, we pro-
s2 . It is shown in [12] that a local quality metric to quantify pose an extended approach which takes blood vessels vis-
image noise and blur in an anisotropic patch P is given by: ible in fundus images as guidance to determine a spatially
weighted quality score.
q(P ) = s1 R, (4)
where R denotes the coherence: 3.1 Color Image Quality Assessment
s1 s2
R= . (5) The quality metric Q given by Eq. (6) is defined for
s1 + s2
grayscale images only. However, in fundus imaging a qual-
Larger values for q(P ) defined in (4) indicate higher image ity score for color images is required. Here, contrast and
quality in terms of blur and noise. It is important to note, saturation of the color channels in RGB space are different.
that q(P ) is only a valid quality metric in an anisotropic Usually the blue channel has poor contrast between back-
patch with dominant gradient direction, whereas in isotropic ground and anatomical structures, whereas the red channel
patches the score is not meaningful. In order to get a global is often overexposed. However, we assume uniform quality
estimate for noise and blur, q(P ) is summed up over all of the channels with respect to noise and sharpness. There-
anisotropic patches and normalized according to: fore we propose to extract the green color channel for qual-
1 X ity assessment, since it shows the best contrast and provides
Q= q(P ij ) (6) maximal structure for subsequent quality assessment.
MN
i,j:P(i,j)=1

where P(i, j) denotes the patch map for image I such that 3.2 Vessel-Based Quality Assessment
P(i, j) = 1 if P ij is anisotropic. These patches may be de-
tected automatically employing statistical tests for the co- One limitation of the quality metric Q is the automatic
herence R (see Fig. 1). Patches having significant coher- detection of anisotropic patches. Using thresholding pro-
ences R > R are assumed to be anisotropic for a fixed cedures may lead to false selections, especially in the case

96 CBMS 2013
of noisy or highly blurred images. On the other hand, fun-
dus images consist of relatively few texture and structure
compared to natural images. Thus, the number of possi-
ble candidates for anisotropic patches is low. To overcome
this problem, we propose to use the vessel tree as guidance,
since we expect that blood vessel boundaries are good can-
didates for true anisotropic patches.

3.2.1 Vesselness Measure


We detect blood vessels in an image I as follows. First, the
green color channel I g is extracted from the color image I (a) (b)
due to the good contrast between vessels and background
compared to the other channels. For each pixel in I g the Figure 2: An example color image (a) and calculated ves-
local Hessian matrix is calculated by: selness measure for blood vessel detection (b).
d2 I d2 I
!
g g
x2 xy
H= d2 I g d2 I g
. (8) computing the variance of the vesselness V in P ij , where
xy y 2 normalization is done using the overall patch number such
P
For detection of blood vessels, we employ the vesselness that i,j ij = 1. Thus, patches P ij located on a blood
measure proposed by Frangi et al. [2] according to: vessel boundary indicated by large ij have higher impact
 2 to the overall estimate for image noise and blur. We demon-

V = exp 12 1 exp (21 + 22 ) (9) strate in our experiments that this quality score is more re-
2
liable than thresholding for patch detection and the use of
for the eigenvalues 1 and 2 of H where 2 1 . Here, uniform weights for all patches.
V represents a probability measure where large values indi-
cate high probability for pixels to be located on a vessel (see 4 Experiments and Results
Fig. 2). Since we are mainly interested in thick vessels and
in order to decrease noise in the vesselness map, we neglect
We evaluated the ability of the proposed quality metric
pixels having small vesselness and set V = 0 for V < V0
to quantify sharpness and noise in retinal fundus images.
below a fixed threshold V0 . We set V0 adaptively to the 80th
First, our Qv defined in (10) is compared to the original
percentile of all non-zero vesselness measurements.
score Q defined in (6) by analyzing the agreement with
In its original version, vesselness is computed pixel-wise
full-reference metrics based on synthetic images. We also
according to (9) using different window sizes to determine
evaluated our approach for real image data. Supplementary
the Hessian. Then, the size achieving the largest vessel-
material for our experiments is available on our web page1 .
ness is used for vessel detection. In this paper, we make
use of a multi-scale approach and determine the vesselness
4.1 Correlation to Full-Reference Metrics
for a fixed window of size 3 3 pixel but at downsampled
versions of the original image. This method speeds up the
For quantitative evaluation, we used 40 images out of the
computation of V for vessel detection.
DRIVE database [10]. From each original image we gen-
erated synthetic images in two steps: (i) We induced blur
3.2.2 Spatially Weighted Quality Metric using a Gaussian filter with fixed size of 7 7 and varying
We utilize the vesselness as spatially adaptive confidence standard deviation b . (ii) From each blurred image, a noisy
weight for quality assessment of retinal fundus images. image was generated by adding zero-mean Gaussian noise
Here, the basic idea is that anisotropic patches located on of varying standard deviation n (see Fig. 3). Having an
blood vessel boundaries are more reliable for the overall original image I and a disturbed image I, we employ the es-
blur and noise estimate. Our vessel-based quality metric tablished full-reference quality metrics peak-signal-to-noise
is defined as: ratio (PSNR) and structural similarity index (SSIM) [11] to
X quantify the degradation of I. For evaluation of the reliabil-
Qv = ij q(P ij ), (10) ity of the proposed metric, we calculated Spearmans rank
i,j:P(i,j)=1 correlation between the full-reference metrics and the no-
where ij denotes the normalized local variance of the ves- reference quality scores.
selness measure in patch P ij . Here, ij is determined by 1 http://www5.cs.fau.de/en/our-team/koehler-thomas

CBMS 2013 97
(a) (b) (c) (d)

Figure 3: Region of interest of an image used as ground truth (a), generated blurred image (b = 3.0) (b), generated noisy
image (n = 102 ) (c) and generated blurred and noisy image (d).

4.1.1 Assessment of Blur Table 1: Spearmans for simultaneously varying noise and
blur for metric Q and our proposed metric Qv .
In our first experiment, we varied the amount of Gaussian
blur from b = 0.5 to b = 3.0. For each blur level, 20 dif-
Full-ref. metric (Q) (Qv )
ferent noise levels ranging from n = 104 to n = 102
were simulated. Both parameters were increased logarith- PSNR 0.8227 0.8920
mically to achieve a uniform sampling of the corresponding SSIM 0.8412 0.9076
PSNR and SSIM measures. Spearmans was calculated
for all blur levels between each no-reference quality score
(Q and Qv ) and PSNR as well as SSIM. Mean and standard
4.2 Real Images
deviation of averaged over 40 images are plotted in Fig. 4.
If b becomes large, is decreased on average and has a
higher standard deviation. Even for large b we achieve cor- We captured 18 image pairs of the same eye from 18
relations higher than 0.8 between the full-reference metrics human subjects using a Canon CR-1 fundus camera with
and the no-reference metrics. Please note, that correlations a field of view of 45 . For each pair, the first image suf-
for PSNR and SSIM were equal in our experiment, since fers from decreased sharpness and thus the examination
both scores were perfectly correlated. had to be repeated. Both images share approximately the
same field of view, whereas small shifts were caused by
eye movements between the acquisitions (see Fig. 6). The
4.1.2 Assessment of Image Noise
proposed metric Qv was compared to the original Q metric
We repeated our first experiment and calculated Spearmans [12] as well as to the CPBD metric [7] and the anisotropy
for different noise standard deviations n . Mean and stan- measure [3]. All metrics were applied to the field of view
dard deviation for averaged over 40 images are plotted in whereas the background regions were masked out for qual-
Fig. 5. Here, Spearmans decreased and had a higher stan- ity assessment. For normalization, we used the m = 104
dard deviation if the noise level n was increased. However, most significant anisotropic patches to determine Q and Qv
for moderate noise levels (n 3 103 ) we still achieve for each image, to neglect the effect of the patch number to
correlations higher than 0.6 for Q and Qv . the global quality score.
We considered quality classification implemented as
thresholding of the estimated quality score. ROC curves
4.1.3 Overall Correlation
for classification based on the different metrics are shown
We also analyzed Spearmans over a whole experiment, in Fig.7. For Qv we obtained an area under the ROC curve
where image noise and blur were varied simultaneously for of 88.3% (CPBD: 50.9%, Anisotropy: 75.3%, Q: 79.6%).
40 images as well as 20 noise and blur levels, respectively. The metric Qv was also compared pair-wise between a
A comparison between Q and Qv is shown in Tab. 1. Here, good acquisition and the corresponding image of poor qual-
we achieve a Spearman correlation of above 0.8 for both ity. Here, the ranking obtained by Qv agrees to a human ob-
approaches with respect to the full-reference metrics PSNR server for 16 out of 18 image pairs resulting in an agreement
and SSIM. of 88.9% (CPBD: 55.6%, Anisotropy: 94.4%, Q: 83.3%).

98 CBMS 2013
(a) (b) (c) (d)

Figure 4: Mean and standard deviation of Spearmans between no-reference quality assessment and PSNR ((a) and (b)) as
well as SSIM ((c) and (d)) for 40 test images versus varying amount of artificial blur.

(a) (b) (c) (d)

Figure 5: Mean and standard deviation of Spearmans between no-reference quality assessment and PSNR ((a) and (b)) as
well as SSIM ((c) and (d)) for 40 test images versus varying amount of additive Gaussian noise.

4.3 Discussion 5 Conclusion

As shown in Fig. 4 for each blur level, our metric Qv In this paper, we presented an improved no-reference im-
outperforms the original approach with respect to mean and age quality metric Qv to quantify the amount of noise and
standard deviation of Spearmans . This is especially no- blur in retinal fundus images. For reliable quality estima-
ticeable for high amounts of blur. In contrast to this result, tion, we employ the vessel tree detected by the well known
in the case of varying noise levels shown in Fig. 5, is lower vesselness measure as guidance to determine a global qual-
for both Q and Qv . However, the mean correlation for Qv ity score from local estimates in anisotropic patches. The
is still improved. Spearmans for simultaneously varying proposed metric shows high agreement with the estab-
noise and blur summarized in Tab.1 indicates higher corre- lished full-reference metrics PSNR and SSIM indicated by
lations of Qv to the full-reference metrics with significance a Spearman rank correlation of 0.89 and 0.91, respectively.
level 0.05. Thus, Qv has a higher agreement with both full- Thus, Qv is able to replace full-reference metrics for qual-
reference metrics over a wide range of noise and blur. ity assessment in the absence of a gold standard. For real
In our experiments using real images, Qv agrees reason- data, our metric agrees reasonable to visual inspection of a
able with visual inspection of the cameras operator. This human operator in terms of image sharpness.
is also the case for non-uniform degradations such as spa- In our future work, we will study the adaption of the pro-
tially varying blur (see Fig. 6a). In terms of quality classi- posed method to applications where image sharpness has
fication, our approach outperforms state-of-the-art methods to be continuously assessed, such as camera auto-focusing.
indicated by an improved area under the ROC curve. Please As another application we focus on the integration of Qv as
note, that our method measures blur and noise whereas re- feature into a classification-based quality rating in combina-
lated aspects such as illumination homogeneity is not ex- tion with different quality features. Our experiments using
plicitly taken into account. However, Qv may be combined real data indicates that this may be feasible and an extensive
with various features to assess different quality criteria. evaluation on large image databases is ongoing research.

CBMS 2013 99
(a) Qv = 0.0091 (b) Qv = 0.0020

(c) Qv = 0.0122 (d) Qv = 0.0099

Figure 6: Fundus images and corresponding scores Qv : If Figure 7: ROC curve for quality classification based on dif-
the quality of the first acquisition was too low (first row), ferent metrics.
the examination was repeated (second row). Images of poor
quality suffer either from local loss of sharpness (a) or are
globally degraded (b). ence on Computer analysis of images and patterns - Part I,
pages 486493, Seville, Spain, 2011. Springer-Verlag.
[5] A. G. Marrugo, M. Sorel, F. Sroubek, and M. S. Millan.
Retinal image restoration by means of blind deconvolution.
Acknowledgment The authors gratefully acknowledge Journal of Biomedical Optics, 16(11):116016, 2011.
funding of the Erlangen Graduate School in Advanced Op- [6] M. Moscaritolo, H. Jampel, F. Knezevich, and R. Zeimer.
tical Technologies (SAOT) by the German National Sci- An image based auto-focusing algorithm for digital fun-
ence Foundation (DFG) in the framework of the excellence dus photography. IEEE Transactions on Medical Imaging,
initiative. This project is supported by the German Fed- 28(11):17031707, 2009.
eral Ministry of Education and Research, project grant No. [7] N. Narvekar and L. Karam. A no-reference image blur
01EX1011D, the European Regional Development Fund - metric based on the cumulative probability of blur detec-
tion (cpbd). Image Processing, IEEE Transactions on,
Project FNUSA-ICRC (No. CZ.1.05/1.1.00/02.0123) and
20(9):26782683, 2011.
by the Czech-German project no. 7AMB12DE002 under [8] M. Niemeijer, M. D. Abramoff, and B. Van Ginneken. Im-
Ministry of Education, Youth and Sports. age structure clustering for image quality verification of
color retina images in diabetic retinopathy screening. Medi-
References cal Image Analysis, 10(6):888898, 2006.
[9] J. Paulus, J. Meier, R. Bock, J. Hornegger, and G. Michel-
son. Automated quality assessment of retinal fundus photos.
[1] M. D. Abramoff, M. K. Garvin, and M. Sonka. Retinal International Journal of Computer Assisted Radiology and
Imaging and Image Analysis. IEEE Reviews in Biomedical Surgery, 5(6):557564, 2010.
Engineering, 3:169208, 2010. [10] J. J. Staal, M. D. Abramoff, M. Niemeijer, M. A. Viergever,
[2] A. F. Frangi, W. J. Niessen, K. L. Vincken, and M. A. and B. Van Ginneken. Ridge based vessel segmentation in
Viergever. Multiscale vessel enhancement filtering. In Med- color images of the retina. IEEE Transactions on Medical
ical Image Computing and Computer-Assisted Intervention Imaging, 23:501509, 2005.
(MICCAI) 1998, volume 1496 of Lecture Notes in Computer [11] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli.
Science, pages 130137. Springer Berlin Heidelberg, 1998. Image quality assessment: from error visibility to struc-
[3] S. Gabarda and G. Cristobal. Blind image quality assess- tural similarity. IEEE Transactions on Image Processing,
ment through anisotropy. Journal of the Optical Society of 13(4):600612, 2004.
America A, 24(12):B42B51, 2007. [12] X. Z. X. Zhu and P. Milanfar. Automatic Parameter Selec-
[4] A. G. Marrugo, M. S. Millan, G. Cristobal, S. Gabarda, and tion for Denoising Algorithms Using a No-Reference Mea-
H. C. Abril. No-reference quality metrics for eye fundus sure of Image Content. IEEE Transactions on Image Pro-
imaging. In Proceedings of the 14th international confer- cessing, 19(12):31163132, 2010.

100 CBMS 2013

You might also like