Medical

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Medical Image Analysis 93 (2024) 103098

Contents lists available at ScienceDirect

Medical Image Analysis


journal homepage: www.elsevier.com/locate/media

CF-Loss: Clinically-relevant feature optimised loss function for retinal


multi-class vessel segmentation and vascular feature measurement
Yukun Zhou a,b,c ,∗, MouCheng Xu a,d , Yipeng Hu a,d,f , Stefano B. Blumberg a,e , An Zhao a,e ,
Siegfried K. Wagner b,c , Pearse A. Keane b,c , Daniel C. Alexander a,e
a
Centre for Medical Image Computing, University College London, London WC1V 6LJ, UK
b NIHR Biomedical Research Centre, Moorfields Eye Hospital NHS Foundation Trust, London EC1V 9EL, UK
c Institute of Ophthalmology, University College London, London EC1V 9EL, UK
d Department of Medical Physics and Biomedical Engineering, University College London, London WC1E 6BT, UK
e
Department of Computer Science, University College London, London WC1E 6BT, UK
f
Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London W1W 7TS, UK

ARTICLE INFO ABSTRACT

Keywords: Characterising clinically-relevant vascular features, such as vessel density and fractal dimension, can benefit
Multi-class vessel segmentation biomarker discovery and disease diagnosis for both ophthalmic and systemic diseases. In this work, we
Vascular feature explicitly encode vascular features into an end-to-end loss function for multi-class vessel segmentation,
Loss function
categorising pixels into artery, vein, uncertain pixels, and background. This clinically-relevant feature optimised
loss function (CF-Loss) regulates networks to segment accurate multi-class vessel maps that produce precise
vascular features. Our experiments first verify that CF-Loss significantly improves both multi-class vessel
segmentation and vascular feature estimation, with two standard segmentation networks, on three publicly
available datasets. We reveal that pixel-based segmentation performance is not always positively correlated
with accuracy of vascular features, thus highlighting the importance of optimising vascular features directly via
CF-Loss. Finally, we show that improved vascular features from CF-Loss, as biomarkers, can yield quantitative
improvements in the prediction of ischaemic stroke, a real-world clinical downstream task. The code is
available at https://github.com/rmaphoh/feature-loss.

1. Introduction research (Wong et al., 2006; Perez-Rovira et al., 2011; Fraz et al.,
2015; Shi et al., 2022; Zhou et al., 2022). Singapore I Vessel As-
The significance of retinal vasculature for assessing ophthalmic sessment (SIVA) (Wong et al., 2006) was used to prove that larger
disease has been well studied, such as venous beading as a hall-
retinal venular calibre is associated with risk of stroke in the Singapore
mark for diagnosing diabetic retinopathy. Characterising vasculature
with clinically-relevant vascular features can further provide valuable Malay Eye Study. Vessel Assessment and Measurement Platform for
insights into systemic disease (Wagner et al., 2020; Cheung et al., Images of the REtina (VAMPIRE) (Perez-Rovira et al., 2011) verified
2021; Wong and Mitchell, 2004), a field which has been termed ‘ocu- that lower fractal dimension was associated with increased odds of
lomics’ (Wagner et al., 2020). For example, increased venular tortuosity albuminuria in the UK Biobank cohort study. Other recent softwares,
is associated with hypertension and increased body-mass index (Owen such as the Quantitative Analysis of Retinal Vessel Topology and size
et al., 2019; Cheung et al., 2011). Eye screening provides a non-invasive
(QUARTZ), focus on translating advanced segmentation methodologies
and economical observation into the microvasculature and central ner-
for accurate feature measurement (Fraz et al., 2015; Zhou et al., 2022;
vous system. Retinal multi-class vessel segmentation followed by a wide
range of clinical feature estimation is considered a standard pipeline Shi et al., 2022). For example, the open-source Automated Retinal
to explore disease markers and monitor the progression of ophthalmic Vascular Morphology Quantification (AutoMorph) used segmentation
and systemic diseases (Cheung et al., 2011; De Fauw et al., 2018; methods which achieved state-of-the-art performance in multi-class
Seidelmann et al., 2016), as shown in Fig. 1. vessel segmentation (Zhou et al., 2022). All these software tools (SIVA,
The last two decades have produced a variety of software tools VAMPIRE, QUARTZ, AutoMorph, etc.) measure vascular features based
designed to measure the vascular features for clinical applications and

∗ Corresponding author at: Centre for Medical Image Computing, University College London, London WC1V 6LJ, UK.
E-mail address: yukun.zhou.19@ucl.ac.uk (Y. Zhou).

https://doi.org/10.1016/j.media.2024.103098
Received 30 August 2022; Received in revised form 22 May 2023; Accepted 30 January 2024
Available online 2 February 2024
1361-8415/© 2024 Published by Elsevier B.V.
Y. Zhou et al. Medical Image Analysis 93 (2024) 103098

Fig. 1. Pipeline of employing retinal vascular features for ophthalmic and systemic disease monitoring, including risk prediction of cardiovascular and neurodegenerative diseases.
After successive operations of eye screening, feature analysis, and risk prediction, the individuals at high risk of ocular and systemic diseases (highlighted in red) are identified
from a large cohort. Feature analysis includes multi-class vessel segmentation and vascular feature measurement, which substantially affect reliability and robustness.

on retinal multi-class vessel maps. Empirically, an accurate segmenta- dimension) from the segmented vessels. We hypothesise that the end-
tion map is essential to obtain precise features. To achieve accurate to-end approach we propose both reduces errors in clinically-relevant
vascular features, substantial recent efforts have focused on multi-class features and enhances the underlying multi-class segmentation, avoid-
vessel segmentation (Mookiah et al., 2021; Chen et al., 2021a; You ing the limitations of optimising simple pixel-based performance met-
et al., 2022). rics. We demonstrate its potential using three public datasets and
Traditional multi-class vessel segmentation involves two steps — multiple network architectures. We summarise the contribution as
vessel segmentation and artery/vein classification. Two main categories below:
of methods, namely density feature-based (Huang et al., 2018; Mirsharif
et al., 2013; Niemeijer et al., 2009; Xu et al., 2017; Niemeijer et al., • To our knowledge, this is the first work that explicitly encodes
2011; Zamperini et al., 2012) and graph-based (Dashtbozorg et al., clinically-relevant vascular features as an end-to-end loss func-
2013; Estrada et al., 2015; Srinidhi et al., 2019; Xie et al., 2020; Zhao tion. CF-Loss works as an extra constraint for network training,
et al., 2019), segment vessels from the retinal fundus photographs and being complementary to pixel-based segmentation loss.
classify the segmented pixels as arteries and veins, respectively with • We have verified the efficacy of CF-Loss in multi-class vessel
hand-crafted features and topological knowledge. Specifically, feature- segmentation, with two standard segmentation networks, U-Net
based methods extract the hand-crafted features from image pixels and BF-Net, on three public datasets — the DRIVE-AV, LES-AV,
and use clustering techniques, such as K-nearest neighbours (KNN) and HRF-AV datasets.
algorithm (Niemeijer et al., 2009; Xu et al., 2017), on these features • We report that vessel maps with high pixel-based segmentation
to classify artery and vein. Graph-based methods include both feature metrics do not necessarily derive accurate vascular features, from
information and topology knowledge to classify artery and vein. three aspects: network backbone, segmentation category, and
Recently, the development of deep learning models has further loss function, highlighting the importance of explicitly optimising
boosted multi-class vessel segmentation performance enabled by the vascular features with CF-Loss for downstream clinical tasks.
powerful capability of representation learning (Hemelings et al., 2019; • We are the first to quantitatively verify the clinical downstream
Galdran et al., 2019; Zhou et al., 2021). These deep learning-based contribution brought by improving segmentation and feature
methods segment arteries and veins directly from retinal fundus pho- measurement in retinal imaging. CF-Loss improves prediction of
tographs using an end-to-end network. Welikala et al. (2017) com- ischaemic stroke, a real-world downstream clinical task.
bined convolution layers and fully connected layers for the first deep
learning-based artery/vein segmentation. After that, more complex
2. Related work
networks, such as U-net (Ronneberger et al., 2015) and its variants (Xu
et al., 2018; Li et al., 2019; Galdran et al., 2019, 2022), have been
2.1. Artery/vein classification and segmentation
employed in multi-class segmentation to achieve better segmentation
performance. Recent methods further improve multi-class vessel seg-
Artery/vein segmentation methods can be divided into two main
mentation by incorporating extra constraints in network training (Hu
categories, traditional methods and deep learning-based methods. Tra-
et al., 2019; Li et al., 2020; Zheng et al., 2021; Shit et al., 2021; Chen
ditional methods first segment vessel pixels and then classify the pixels
et al., 2021b, 2020; Zhou et al., 2021). One exemplar is that some
researchers (Hu et al., 2019; Li et al., 2020; Chen et al., 2021b) devise as arteries or veins. Depending on whether the classifiers are developed
topology-preserved loss functions for network training. This regulates with artery/vein labels, traditional methods can be further categorised
the networks to segment vessels with correct topology, which can be as supervised and unsupervised methods (Mookiah et al., 2021).
used to alleviate the issue of vessel fragments, a well identified vessel Supervised methods classify the artery and vein with trained clas-
segmentation challenge. sifiers, such as linear discriminant analysis (LDA) (Huang et al., 2018;
Previous work assumes that improving pixel-based segmentation Mirsharif et al., 2013; Niemeijer et al., 2011; Dashtbozorg et al., 2013),
metrics leads to precise vascular features, thus focusing on raising k-nearest neighbour (kNN) (Xu et al., 2017; Niemeijer et al., 2009), and
pixel-based metrics while ignoring the vascular feature quantifica- random forest (Srinidhi et al., 2019). Mirsharif et al. (2013) investi-
tion which directly contributes to biomarker discovery and disease gated the variance and difference of pixel intensity of vessel walls and
diagnosis (Hemelings et al., 2019; Galdran et al., 2019; Zhou et al., centrelines, as the input to linear discriminant analysis classifier. Huang
2021; Mookiah et al., 2021). Additionally, vascular features, as clini- et al. (2018) collected 455 features for vessel centreline pixels and
cally meaningful characteristics of vasculature, can provide constraints applied a generic search to choose informative features for artery/vein
to regulate multi-class vessel segmentation, but have not been ex- classification. Dashtbozorg et al. (2013) separated the segments be-
ploited yet. Here, we propose a clinically-relevant feature optimised tween intersections as sub-graphs and classified each sub-graph as
loss function (CF-Loss) to simultaneously optimise multi-class vessel artery or vein according to intensity features. Xu et al. (2017) added
segmentation and vascular feature measurement. The key advance is to the first-order and second-order texture features for the kNN classi-
incorporate in the training loss a term that explicitly optimises the abil- fier. Srinidhi et al. (2019) exploited local information to disentangle the
ity to generate downstream clinical features (vessel density and fractal whole vessel map into multiple subtrees and used the random forest to

2
Y. Zhou et al. Medical Image Analysis 93 (2024) 103098

classify these subtrees into arteries and veins. Zamperini et al. (2012) of segmented centreline and theoretically proved that clDice guarantees
studied 16 mixing features, including colour contrast inside and outside topology preservation up to homotopy equivalence for segmentation.
the vessels and spatial information, for artery/vein classification with a Inspired by the constraints based on topology, we propose new reg-
linear Bayes normal classifier. Estrada et al. (2015) incorporated global ularisation with vascular features to improve biomarker quantification,
domain-specific features and a heuristic optimisation algorithm into the which directly contributes to downstream clinical tasks. Moreover, a
tree topology estimation framework to build a likelihood model for diverse range of vascular features supplements additional constraints
artery/vein classification. Unsupervised methods (Zhao et al., 2019; Re- absent with topological constraints, such as vessel density and fractal
lan et al., 2019; Joshi et al., 2014) mainly use the clustering technique dimension.
and Gaussian mixture model to classify the arteries and veins. Zhao
et al. (2019) employed an active contour model (Zhao et al., 2015) 2.3. Clinically-relevant vascular features
to segment the vessels and formalised the artery/vein classification as
a pairwise clustering problem, which split the entire vessel graph as A wide range of vascular features, including fractal dimension and
subtrees and classified artery/vein via dominant set clustering. Joshi vessel density, have been proven as potential biomarkers for ocular
et al. (2014) extracted the vessel colour properties and classified the diseases and systemic diseases.
arteries and veins with a fuzzy C-means clustering algorithm. Relan Fractal dimension characterises vasculature shape complexity. Liew
et al. (2019) used orthogonal locality preserving projections to find the et al. (2011) found the lowest and highest quartiles of retinal vas-
most dominant vessel features and classified the arteries and veins with cular fractal dimension had higher 14-year coronary heart disease
a Gaussian Mixture Model with Expectation–Maximisation. mortality. Liew et al. (2008) discovered that fractal dimension was
Deep learning-based methods have been widely explored for statistically significantly lower in participants with than without hy-
artery/vein segmentation (Hemelings et al., 2019; Galdran et al., 2019; pertension. Cheung et al. (2009) recognised that the greater retinal
Zhou et al., 2021). Unlike traditional feature-based and graph-based fractal dimension was independently associated with a larger odd of
methods, deep learning-based methods can produce multi-class seg- early diabetic retinopathy in type 1 diabetes, which indicated that frac-
mentation map in an end-to-end manner. Welikala et al. (2017) used tal dimension may allow quantitative measurement of early diabetic
six layers, including three convolutional layers and three fully con- microvascular damage. Shi et al. (2020) found that the retinal capillary
nected layers, to segment arteries and veins from image patches. Xu complexity in the inferior quadrant was negatively correlated with the
et al. (2018) configured a fully convolutional network and a domain- visual acuity and Parkinson disease duration.
specific loss function to improve the overall performance. Li et al. Vessel density quantifies the ratio of vessel pixels to the whole im-
(2019) combined U-net and multi-scale deep supervision for extracting age. Chua et al. (2019) found that sparser vessel density was associated
artery/vein details. Galdran et al. (2019) revealed that uncertain pixels, with higher systolic blood pressure. Chang et al. (2019) recognised
e.g. pixels at vessel intersection, existed due to the limitation of 2D that reduced vessel density was linked with age and longer diabetes
images. Galdran et al. (2022) devised a coarse-to-fine network which duration in the African American Eye Disease Study. Yu et al. (2017)
consists of two simplified U-net to extract the artery and vein. Zhou observed that retinal vessel density decreased with greater severity of
et al. (2021) proposed to correct segmentation errors at the vessel obstructive sleep apnoea–hypopnoea syndrome. Akil et al. (2017) found
intersections by information fusion between multi network branches. a significant difference in vessel density between normal eyes and those
All these methods focus on improving multi-class vessel segmen- with early glaucoma.
tation by optimising pixel-based loss functions, which rely on the These studies all show the great potential of vascular features in
intuitive assumption that higher pixel-based performance metrics cor- clinical research. What motivates our work is robust automated es-
respond to more precise vascular features. In our study, we test the timation of these features, which current image analysis algorithms
assumption and explicitly encode the calculation of vascular features do not optimise. These vascular features are calculated in standard
in end-to-end loss functions, thus enhancing both multi-class vessel formulas and have not been optimised in end-to-end networks due
segmentation and vascular feature measurement. to some non-differentiable operations. In our work, we propose ap-
proximations and validate the performance with multiple datasets and
2.2. Topological constraints in neural networks network architectures.

Although deep learning-based methods achieve good performance 3. Methods


in artery/vein segmentation, artery/vein connectivity is challenging to
maintain with networks optimised by pixel-based segmentation loss We explicitly encode the vascular features as end-to-end loss func-
functions. To this end, some recent work has focused on encoding tions, as shown in Fig. 2. Among a wide range of informative vascular
topological knowledge in networks. The most common way is to con- features (Zhou et al., 2022; Wagner et al., 2020), we focus on two spe-
struct a topological constraint for network training, thus learning to cific features: vessel density and fractal dimension (box count), which
segment the artery and vein with good connectivity and topology (Chen we favour for their computational efficiency. However, the concept
et al., 2019; Shit et al., 2021; Zheng et al., 2021; Hu et al., 2019; extends to other image-based vascular features.
Li et al., 2020; Chen et al., 2020, 2021b). Let 𝑆, 𝑇 ∈ [0, 1] denote a We denote 𝑆, 𝑇 ∈ [0, 1]ℎ×𝑤 as a segmentation map and a ground-
segmentation map and a ground-truth map respectively. Chen et al. truth map respectively, where ℎ and 𝑤 indicate the height and width
(2019) designed a loss combining the length and region information of the image.
to find an active contour which is a global minimisation of active
∑√ ∑ 3.1. Vessel density loss function
contour energy, i.e. (∇𝑆𝑥 )2 + (∇𝑆𝑦 )2 + 𝜖 + (𝑆(𝟏 − 𝑇 )2 + (𝟏 − 𝑆)𝑇 2 )
where ∇𝑆𝑥 and ∇𝑆𝑦 indicate the pixel gradients in 𝑥 and 𝑦 axes respec-
tively. Zheng et al. (2021) reformulated the graph cuts loss function Vessel density function 𝑣 measures a ratio of vessel area 𝐴 to the
which focused on a dual penalty to optimise the regional properties whole image area, e.g. 𝑣(𝑆) = 𝐴(𝑆>0)
ℎ×𝑤
where 𝑆 > 0 selects the potential
(binary cross entropy between 𝑆 and 𝑇 ) and boundary regularisation vessel area. The vessel density loss 𝐿𝑜𝑠𝑠𝑉 (𝑆, 𝑇 ) can be expressed as
∑{ }
1 − ||𝑆𝑢 − 𝑆𝑣 || ⋅ 𝛿(𝑇𝑢 , 𝑇𝑣 ), where (𝑢, 𝑣) represents adjacent pixel pair | 𝐴(𝑆 > 0) 𝐴(𝑇 > 0) |
𝐿𝑜𝑠𝑠𝑉 (𝑆, 𝑇 ) = || − |
and 𝛿(𝑇𝑢 , 𝑇𝑣 ) = 1 if 𝑇𝑢 ≠ 𝑇𝑣 . Shit et al. (2021) proposed centreline-based | ℎ×𝑤 ℎ × 𝑤 ||
Dice, 𝑐𝑙𝐷𝑖𝑐𝑒 = 𝐷𝑖𝑐𝑒(𝑠𝑘𝑒(𝑆), 𝑠𝑘𝑒(𝑇 )) where 𝑠𝑘𝑒(⋅) skeletonises the vessels | ∑𝑆 ∑ | (1)
| 𝑇 |
=| − |.
to centrelines and 𝐷𝑖𝑐𝑒(⋅) calculates Dice score, to evaluate the accuracy |ℎ × 𝑤 ℎ × 𝑤|
| |

3
Y. Zhou et al. Medical Image Analysis 93 (2024) 103098

Fig. 2. Schematic of the proposed loss function CF-Loss. We measure two vascular features with Algorithm 1 and encode them into loss functions, the vessel density loss 𝐿𝑜𝑠𝑠𝑉
(Eq. (2)) and box count loss 𝐿𝑜𝑠𝑠𝐵 (Eq. (6)), to regulate the segmentation network to accurately segment vessel maps specifically to derive accurate fractal dimension and vessel
density.

Algorithm 1 Vessel density and soft box count regression is used to fit a straight line with slope approximating 𝑓 (𝑇 ).
We denote error on the 𝑦 axis
Input: Segmentation map 𝑆 ∈ Rℎ×𝑤
∑𝑖=ℎ ∑𝑖=𝑤 𝑁(𝑇 , 𝜀)
1: 𝑣(𝑆) = 𝑖=1 𝑗=1 𝑆𝑖,𝑗 ∕(ℎ × 𝑤) ⊳ calculate vessel density 𝛥𝑦 = log(𝑁(𝑇 , 𝜀)) − log(𝑁(𝑆, 𝜀)) = log . (4)
{ 𝑖 } 𝑁(𝑆, 𝜀)
2: 𝜀 = 2 |𝑖 ∈ Z, 2 ≤ 2𝑖 ≤ 𝑚𝑖𝑛 {ℎ, 𝑤} ⊳ a set of box sizes
𝑎𝑟𝑒𝑎(𝑆>0)
̂
3: 𝑁(𝑆, 𝜀), 𝑁(𝑆, ̃
𝜀), 𝑁(𝑆, 𝜀)=[], [] ⊳ box count list When 𝜀 is small, 𝑁(𝑇 , 𝜀) ≈ , thus Eq. (4) can be transferred
𝜀2
4: for 𝑘 ← 𝜀 do ⊳ iterate box size as
5: 𝑆 ′ = 𝑧𝑒𝑟𝑜𝑝𝑎𝑑(0, 𝑘 − ℎ%𝑘, 0, 𝑘 − 𝑤%𝑘)(𝑆) 𝑁(𝑇 , 𝜀) 𝑎𝑟𝑒𝑎(𝑇 > 0)∕𝜀2 𝑎𝑟𝑒𝑎(𝑇 > 0)
𝛥𝑦 = log ≈ log = log , (5)
6: 𝑆 ′ = 𝑎𝑣𝑒𝑟𝑝𝑜𝑜𝑙(𝑘𝑒𝑟𝑛𝑒𝑙𝑠𝑖𝑧𝑒 = 𝑘, 𝑠𝑡𝑟𝑖𝑑𝑒 = 𝑘)(𝑆 ′ ) 𝑁(𝑆, 𝜀) 𝑎𝑟𝑒𝑎(𝑆 > 0)∕𝜀2 𝑎𝑟𝑒𝑎(𝑆 > 0)
7: 𝑐 = 𝑆′ > 0 ⊳ Index of boxes including vessels
∑𝑖=ℎ∕𝑘 ∑𝑖=𝑤∕𝑘 which shows 𝛥𝑦 is a constant determined by the vessel area of 𝑆 and 𝑇 .
8: 𝑁(𝑆, 𝜀).𝑎𝑝𝑝𝑒𝑛𝑑( 𝑖=1 𝑗=1
(1𝑐 )𝑖,𝑗 ) ⊳ raw box count This indicates that two fitted lines (respectively for ground truth 𝑇 and
̂ ∑𝑖=ℎ∕𝑘 ∑𝑖=𝑤∕𝑘 ′
9: 𝑁(𝑆, 𝜀).𝑎𝑝𝑝𝑒𝑛𝑑( 𝑖=1 𝑗=1
(𝑆𝑐 )𝑖,𝑗 ) ⊳ first soft box count segmentation 𝑆) are approximately parallel, i.e. approximate fractal
10: ̃
𝑁(𝑆, 𝜀).𝑎𝑝𝑝𝑒𝑛𝑑((𝑆 ′ )𝑖,𝑗 ) ⊳ second soft box count dimensions 𝑓 (𝑇 ) and 𝑓 (𝑆). In this case, the error by |𝑓 (𝑇 ) − 𝑓 (𝑆)| is
𝑐
11: end for tiny even when 𝑆 is far from 𝑇 . A network 𝜃 trained by minimising
̂
12: return 𝑣(𝑆), 𝑁(𝑆, 𝜀), 𝑁(𝑆, ̃
𝜀), 𝑁(𝑆, 𝜀) |𝑓 (𝑇 ) − 𝑓 (𝑆)| has gradient 𝜕|𝑓 (𝑇 𝜕𝜃
)−𝑓 (𝑆)|
≈ 0, which faces the issue of
gradient vanishing.
Instead, we identify that the box counts 𝑁(𝑇 , 𝜀) and 𝑁(𝑆, 𝜀) are ap-
propriate candidates for loss construction because (1) accurate 𝑁(𝑆, 𝜀)
For multi-class segmentation task, we get a multi-class segmenta- derives precise 𝑓 (𝑠) as 𝑓 (𝑠) is approximated by the fitting slope be-
tion map with four channels 𝑆𝑚 = (𝑆𝑏 , 𝑆𝑎 , 𝑆𝑣 , 𝑆𝑢 ), which respectively tween 𝑁(𝑆, 𝜀) and 𝜀, (2) it avoids the gradient vanishing issue as
correspond to probability map of background 𝑆𝑏 , artery 𝑆𝑎 , vein 𝑆𝑣 , |𝑁(𝑇 , 𝜀) − 𝑁(𝑆, 𝜀)| monotonically increases when 𝑆 is more differ-
and uncertain pixels 𝑆𝑢 . We calculate the vessel density loss function ent from 𝑇 and (3) the varying box size 𝜀 introduces regularisation
for artery and vein for network training among pixels in various size of receptive fields, i.e. large boxes con-
|∑𝑆 − ∑𝑇 | |∑𝑆 − ∑𝑇 |
tribute to semantic information in large receptive fields, while small
| 𝑎 𝑎| | 𝑣 𝑣| boxes contain high resolution information in local areas. We calculate
𝐿𝑜𝑠𝑠𝑉 (𝑆𝑚 , 𝑇𝑚 ) = | |+| |, (2)
| ℎ×𝑤 | | ℎ×𝑤 | the
| | | | { 𝑖 box count 𝑖 loss function}𝐿𝑜𝑠𝑠𝐵 (𝑆, 𝑇 ) with a set of box sizes 𝜀 =
where 𝑇𝑎 and 𝑇𝑣 indicate the ground truth of artery and vein. 2 |𝑖 ∈ Z, 2 ≤ 2 ≤ 𝑚𝑖𝑛 {ℎ, 𝑤} as

1 ∑ 𝑁(𝑇 , 𝜀𝑖 ) − 𝑁(𝑆, 𝜀𝑖 ) 2
3.2. Box count loss function 𝐿𝑜𝑠𝑠𝐵 (𝑆, 𝑇 ) = √ ⋅ 𝜀𝑖 ⋅ ( ) , (6)
∑ 2 𝑖
𝑁(𝑇 , 𝜀𝑖 )
𝑖 𝜀𝑖
Fractal dimension function 𝑓 evaluates the vessel morphology com-
plexity. Minkowski–Bouligand dimension (also known as box-counting where 𝑁(𝑇 𝑁(𝑇
,𝜀)−𝑁(𝑆,𝜀)
,𝜀)
normalises multi-scale box errors to have the same
dimension) is commonly used in determining fractal dimension and magnitudes. 𝜀𝑖 is empirically configured to weight more√on the error
∑ 2
characterising retinal vasculature (Falconer, 2004). For vessel map 𝑇 , of large-size box to regularise semantic information. 1∕ 𝑖 𝜀𝑖 scales
the fractal dimension is 𝐿𝑜𝑠𝑠𝐵 (𝑆, 𝑇 ) to the same level of magnitude to 𝐿𝑜𝑠𝑠𝑉 (𝑆, 𝑇 ).
log𝑁(𝑇 , 𝜀) For multi-class vessel segmentation 𝐿𝑜𝑠𝑠𝐵 (𝑆𝑚 , 𝑇𝑚 ), we sum up the
𝑓 (𝑇 ) = lim , (3) box count loss functions for artery 𝐿𝑜𝑠𝑠𝐵 (𝑆𝑎 , 𝑇𝑎 ) and vein 𝐿𝑜𝑠𝑠𝐵 (𝑆𝑣 , 𝑇𝑣 ).
𝜀→0 log(1∕𝜀)

where 𝜀 indicates square box size and 𝑁(𝑇 , 𝜀) represents the number of 3.3. Differentiable CF-loss for model training
boxes required to cover segmented vessels in 𝑇 . 𝜀 needs infinitely close
to 0 in Eq. (3). All the calculation for vessel density 𝑣(𝑆) and box count 𝑁(𝑆, 𝜀)
{ 𝑖
In practice, we choose a set of box sizes 𝜀 = 2 |𝑖 ∈ Z, needs to be differentiable to ensure the gradients, as shown in Algo-
}
2 ≤ 2𝑖 ≤ 𝑚𝑖𝑛 {ℎ, 𝑤} as 𝑥 axis, and 𝑁(𝑇 , 𝜀) as 𝑦 axis. Least-square rithm 1. We approximate the box count 𝑁(𝑆, 𝜀) with two kinds of soft

4
Y. Zhou et al. Medical Image Analysis 93 (2024) 103098

HRF-AV are macula-centred while those of LES-AV are disc-centred,


imaging the retinal vessel in different areas. With diverse characteristics
of the three datasets, the efficacy of the proposed CF-Loss can be
evaluated on test data in different image contrast, size, and laterality.
We include a subset of data from the AlzEye project (Wagner
et al., 2022) to verify the clinical contribution brought by CF-Loss.
The AlzEye project is a retrospective cohort study of patients aged
40 years and over who have attended Moorfields Eye Hospital between
1 January 2008 and 1 April 2018. Patients were included if they
had attended the glaucoma, retina, neuro-ophthalmology or emergency
Fig. 3. Linear regression of 𝐿𝑜𝑠𝑠𝐵 with (𝑤) different box counts. 𝑥 axis is 𝐿𝑜𝑠𝑠𝐵 with
̂ ophthalmic services. Systemic health data were derived from Hospital
raw box count 𝑁(𝑆, 𝜀) and 𝑦 axis with the first soft box count 𝑁(𝑆, 𝜀) (left side) and
̃
the second soft box count 𝑁(𝑆, 𝜀) (right side). Episode Statistics data relating to admitted patient care, with a focus
on cardiovascular disease and all-cause dementia (Wagner et al., 2022).
Specifically, ischaemic stroke was defined as code I63/I64 according to
̂ ̃ the International Classification of Diseases 10th revision (Schnier et al.,
box count 𝑁(𝑆, 𝜀) and 𝑁(𝑆, 𝜀) as the raw box count 𝑁(𝑆, 𝜀) is not
2019). Here, we set a task of predicting three-year ischaemic stroke
differentiable. By combining the operation of zeropad and averagepool,
incidence using derived retinal vascular features. Macula-centred reti-
the vessel ratio in each box can be calculated to select the target boxes.
̂ nal colour images of 1548 patients (774 with stroke and 774 controls)
The first soft box count 𝑁(𝑆, 𝜀) substitutes the binary threshold with
originating from the AlzEye project are used in this task. All the images
an average vessel ratio, while the second soft box count calculates
were from imaging device Topcon 3DOCT-2000SA and collected from
box-wise distance between 𝑆 and 𝑇 by ||𝑁(𝑆, ̃ ̃ , 𝜀)| and then
𝜀) − 𝑁(𝑇 | the left eye. A logistic regression model was trained with 60% of the
conducts the summation across all boxes. We randomly sample 1000
data to predict stroke incidence with the input of a single vascular
pairs of patches (𝑆, 𝑇 ) from multi-class vessel maps and calculate the
̂ feature, such as fractal dimension or vessel density, and tested with
box count loss 𝐿𝑜𝑠𝑠𝐵 (𝑆, 𝑇 ) respectively with 𝑁(𝑆, 𝜀), 𝑁(𝑆, 𝜀), and
̃ 40% data.
𝑁(𝑆, 𝜀). The correlation is shown in Fig. 3, where the 𝐿𝑜𝑠𝑠𝐵 with two
soft box counts approximate linear functions of 𝐿𝑜𝑠𝑠𝐵 with the raw box
4.2. Backbone network and training details
counts.
In model training, the inputs of the proposed loss functions 𝐿𝑜𝑠𝑠𝑉
CF-Loss can be used in any network training. We use U-Net (Ron-
and 𝐿𝑜𝑠𝑠𝐵 are the multi-class vessel segmentation map 𝑆𝑚 after
neberger et al., 2015) and BF-Net (Zhou et al., 2021) as example
𝑠𝑜𝑓 𝑡𝑚𝑎𝑥 activation and ground-truth map 𝑇𝑚 . 𝐿𝑜𝑠𝑠𝐵 and 𝐿𝑜𝑠𝑠𝑉 can
segmentation backbones. We substitute each backbone’s original loss
be used for end-to-end network training. A combination of pixel-based
respectively with CF-Loss and compared recent losses, Active Con-
loss function and complementary regularisation is effective and widely
tour (AC) (Chen et al., 2019), GraphCut (GC) (Zheng et al., 2021),
employed in previous work (Shit et al., 2021; Zheng et al., 2021). To
clDice (Shit et al., 2021). AC incorporated area and size information
simultaneously achieve accurate multi-class segmentation maps and
of regions of interest in loss function. GC transferred graph cut cost
vascular features, we combine the feature-based loss functions, 𝐿𝑜𝑠𝑠𝑉
function as an extra penalty to optimise the boundary accuracy. clDice
and 𝐿𝑜𝑠𝑠𝐵 , and pixel-based cross entropy loss (𝐿𝑜𝑠𝑠𝐶𝐸 ) to construct
extracted skeleton and regulated network training with Dice score
CF-Loss in three ways on the skeleton. All compared methods achieved competing perfor-
𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 = 𝐿𝑜𝑠𝑠𝐶𝐸 + 𝜆 ⋅ 𝐿𝑜𝑠𝑠𝐵 , (7) mance in a wide range of medical tasks, including retinal binary vessel
segmentation. However, the multi-class vessel segmentation in our
𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 = 𝐿𝑜𝑠𝑠𝐶𝐸 + 𝛽 ⋅ 𝐿𝑜𝑠𝑠𝑉 , (8) experiment is much more challenging as the arteries and veins share
𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 = 𝐿𝑜𝑠𝑠𝐶𝐸 + 𝜆 ⋅ 𝐿𝑜𝑠𝑠𝐵 + 𝛽 ⋅ 𝐿𝑜𝑠𝑠𝑉 , (9) a highly similar distribution in pixel density and morphology. For each
compared loss function, we use their official code and calculate the
where 𝜆 and 𝛽 are the loss weights for vessel density loss 𝐿𝑜𝑠𝑠𝑉 and loss respectively for arteries and veins and sum them up, following the
box count loss 𝐿𝑜𝑠𝑠𝐵 respectively. By adjusting 𝜆 and 𝛽, the effects of same way of calculating 𝐿𝑜𝑠𝑠𝑉 and 𝐿𝑜𝑠𝑠𝐵 . As clDice is computationally
introducing either a single feature-based loss 𝐿𝑜𝑠𝑠𝑉 and 𝐿𝑜𝑠𝑠𝐵 , or their expensive, we halve BF-Net channels for all loss functions for a fair
combination can be investigated. comparison.
We set the final activation function as softmax. The segmentation
4. Experiments output is a class probability map including four categories, background,
artery, vein, and uncertain pixel. The uncertain category refers to the
4.1. Experiment data pixels that cannot be discriminated due to limited information, such as
the intersections between arteries and veins.
We use three publicly available datasets, DRIVE-AV (Staal et al., In training, all training images are resized to (720, 720) to fit com-
2004; Hu et al., 2013), LES-AV (Orlando et al., 2018), and HRF-AV (Bu- putation resources. For hyperparameters, the learning rate is 0.0008,
dai et al., 2013; Hemelings et al., 2019), to verify the performance batch size is 2, and the total epoch is 500. The optimiser is Adam
of multi-class vessel segmentation and feature measurement. All three (Kingma and Ba, 2014) with 𝛽1 = 0.5 and 𝛽2 = 0.999. We use
datasets provide the labels for the background, artery, vein, and un- flipping, rotation, and colour enhancement for data augmentation. 10%
certain classes. Following the labelling strategy of the HRF-AV dataset, training images are used for validation to schedule the learning rate
all unidentifiable vessels and crossings between arteries and veins are and training process. The learning rate halves when the training loss
defined as uncertain class. DRIVE-AV has 40 colour fundus photographs has stopped decreasing for continuously 50 epochs. When the loss
with size (565, 584) collected with CR5 non-mydriatic 3CCD camera has not decreased for 200 epochs, the training will be stopped. And
(Canon), where 20 images are for training and 20 for testing. LES-AV the model is saved when the validation achieves a higher F1-score
contains 22 images with size (1620, 1444), 11 for training and 11 for (Dice). We report the performance in mean±sd of eight models with
testing. The imaging device for LES-AV is Visucam Pro NM fundus cam- different subsets of training data. All the performance improvements
era (Carl Zeiss Meditec, Jena, Germany). HRF-AV includes 45 images (%) reported in Section 4.4 are absolute changes. For loss weight 𝜆 and
with size (3504, 2336) imaged with CF-60UVi camera (Canon), 24 for 𝛽, we respectively set it as 0.5 and 1 based on the highest F1-score in
training and 21 for testing. Additionally, the images of DRIVE-AV and validation data, whilst test data remains unseen in model development.

5
Y. Zhou et al. Medical Image Analysis 93 (2024) 103098

Fig. 4. Visualisation examples on DRIVE-AV, HRF-AV, and LES-AV. From left to right are colour fundus photographs, ground-truth map, segmentation with cross entropy, with
clDice, and with a representative of proposed CF-Loss (𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 in Eq. (7)). Local comparisons are highlighted in the white dash boxes.

The majority of experiment results with 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 are based on the 4.4. Experiment results
̃
second soft box count 𝑁(𝑆, 𝜀), as it performs better in validation. We
implement the code using Pytorch 1.9 and use Tesla T4 GPU (16 GB) 4.4.1. Comparison among three CF-loss
in all experiments. We first compare the performance of three formats of CF-Loss
(Eqs. (7), (8), and (9)). Table 1 and Table 3 show the multi-class vessel
4.3. Evaluation metrics segmentation results on three datasets, respectively with backbone U-
Net and BF-Net. With U-Net (Table 1), CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 achieved
the highest pixel-based segmentation metrics, including sensitivity,
We evaluate the standard multi-class segmentation performance,
F1-score, IOU, MSE. 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 also performed well in topology correct-
instead of artery/vein classification. This can precisely evaluate how
ness, ranking first place in DRIVE-AV and HRF-AV datasets. For feature
the methods perform in segmenting multi-class vessels. The segmenta-
agreement of fractal dimension ICCF and vessel density ICCV , the
tion map is resized back to the original size for calculating metrics.
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 achieved the best performance on three datasets.
After the maximisation of generated softmax probability map, each
The 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 ranked in the middle for both pixel-based segmentation
pixel has been classified as one of the four categories, the background,
metrics and feature agreement. When the network backbone is BF-Net
artery, vein, and uncertain pixel. We calculate the F1-score (Dice) with
(Table 3), the CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 and 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 showed comparable
𝐹 = (𝑛𝑎 ⋅ 𝐹𝑎 + 𝑛𝑣 ⋅ 𝐹𝑣 + 𝑛𝑢 ⋅ 𝐹𝑢 )∕(𝑛𝑎 + 𝑛𝑣 + 𝑛𝑢 ), where 𝑛𝑎 , 𝑛𝑣 , and 𝑛𝑢
pixel-based segmentation performance, better than 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 in DRIVE-
respectively represent the pixels number belonging to artery, vein, and
AV and LES-AV datasets. 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 also achieved the highest feature
uncertainty pixel in label. 𝐹𝑎 , 𝐹𝑣 , and 𝐹𝑢 are the F1-score scores in each
agreement across the three datasets. This observation extends to the
binary measurement, e.g. artery pixels versus all other pixels. We also artery and vein segmentation metrics (Table 4 and Table 5). We select
calculate the sensitivity, mean square error (MSE), intersection over 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 as a representative for comparison with recent loss functions
union (IOU) for multi-class vessel segmentation. and the following stroke prediction.
For evaluating derived vascular features, we measure the agreement
of estimated vascular features to ground-truth features derived from 4.4.2. Comparison to recent loss functions
manual multi-class annotation. Following previous work (Cheung et al., From Table 1 and Table 3, all three CF-Loss showed better perfor-
2021), intra-class correlation coefficient (ICC) is used to evaluate the mance in all metrics compared to AC, GC, and clDice. With U-Net,
feature agreement. Specifically, we evaluate ICC for fractal dimension 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 increased F1-score by 2.88%, 2.06%, and 2.03% respec-
and vessel density for artery and vein. To study the efficacy of CF-Loss tively on DRIVE-AV, LES-AV, and HRF-AV datasets compared to clDice.
on topology correctness, we measure the Betti number error (Hu et al., 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 highly improved the feature agreement ICCV by 8%, 7%, and
2019; Shit et al., 2021). 8% and decreased the Betti error by 3.06, 3.66, and 2.78 respectively on
To evaluate the real-world clinical benefits by CF-Loss, we show three datasets, compared to clDice. With BF-Net, 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 increased
the stroke classification performance of the logistic regression model F1-score by 2.77%, 2.67%, and 2.36% and feature agreement ICCV by
by AUC-ROC and AUC-PR. p-values from the Mann–Whitney U test are 9%, 12%, and 12% respectively on three datasets, compared to clDice.
reported, when statistical comparisons are made. This improvement in multi-class segmentation performance comprises

6
Y. Zhou et al. Medical Image Analysis 93 (2024) 103098

Fig. 5. Performance on DRIVE-AV with different 𝜆 and 𝛽 values for the CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 in first column, 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 in second column, and 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 in last two columns. First
row indicates F1-score and second row shows the agreement of fractal dimension ICCF to ground truth.

Fig. 6. ROC and PR curves for predicting stroke incidence with (a) artery fractal dimension, (b) artery vessel density, (c) vein fractal dimension, and (d) vein vessel density from
different segmentation loss functions. AUC-ROC and AUC-PR are listed in legends. Bootstrapped confidence intervals, [5th, 95th] percentiles of AUC-ROC and AUC-PR, are plotted
in corresponding colour shades.

the advances in segmenting arteries and veins (Table 4 and Table 5). for 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 ), ICCF shows a growing trend while F1-score slightly
We show three segmentation examples for three datasets in Fig. 4. declines. From the fourth column, 𝛽 (loss weights for 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 )
Multi-class vessel maps from CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 are closest to ground works in the opposite. This is also supported by Table 1 and Table 3
truth maps. The patches highlighted with white dash boxes show that where 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 usually performed better in feature agreement while
𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 helps with vessel connectivity and correctness of arteries and 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 showed advantages in pixel-based segmentation metrics in
veins. some cases. The combination of 𝜆 and 𝛽 presumably offers a weak
tradeoff between segmentation metrics and vascular feature agreement
4.4.3. Ablation 1: Impact of loss weights on test data.
The loss weights 𝜆 and 𝛽 were set according to the validation
4.4.4. Ablation 2: Effects of soft box count and box size weight
performance. We study their effect on test performance. Fig. 5 depicts
We compare the performance of two soft box counts, 𝑁(𝑆, ̂ 𝜀) and
the performance on DRIVE-AV dataset, with backbone U-Net (red line)
̃
𝑁(𝑆, 𝜀), which are introduced in Section 3.3. The comparison results on
and BF-Net (blue line), with different loss weights 𝜆 and 𝛽. When 𝜆 = 0 ̃
DRIVE-AV dataset is shown in Fig. 7, between the blue boxes (𝑁(𝑆, 𝜀))
and 𝛽 = 0, it reduces to cross entropy (CE) loss. The introduction of ̂ ̃
and green boxes (𝑁(𝑆, 𝜀)) in each sub-graph. The soft box count 𝑁(𝑆, 𝜀)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 (𝜆 = 0.1 and 𝜆 = 0.2 in first column of Fig. 6) ̂
provided a higher F1-score and lower MSE than 𝑁(𝑆, 𝜀).
rapidly increases F1-score and fractal dimension agreement. After 𝜆 is
In Eq. (6), we empirically use the box size as weights for 𝐿𝑜𝑠𝑠𝐵 .
larger than 0.2, the performance gets saturated. For 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 in second
To validate the efficacy of box size weight, we compare the perfor-
column, the F1-score and ICC became stable when 𝛽 = 0.6. mance of 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 with- and without box size weight, i.e. 𝐿𝑜𝑠𝑠𝐵 ′ =
The performance of 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 is relatively stable, as shown in √
∑ 𝑁(𝑇 ,𝜀𝑖 )−𝑁(𝑆,𝜀𝑖 ) 2
the third and fourth columns. With the increase of 𝜆 (larger weights 𝑖( 𝑁(𝑇 ,𝜀 )
) . The results are shown in Fig. 7. With the box
𝑖

7
Y. Zhou et al. Medical Image Analysis 93 (2024) 103098

̂
Fig. 7. Performance comparison between 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 with- and without box size weight, as well as using first soft box count 𝑁(𝑆, 𝜀) on DRIVE-AV dataset.

Table 1
Multi-class segmentation performance with U-Net on DRIVE-AV, LES-AV, and HRF-AV. Betti error evaluates the topological correctness of segmentation maps. ICCF evaluate the
agreement of fractal dimension to that derived from ground-truth maps and ICCV for vessel density. p-value of Mann–Whitney U test between CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 and clDice is
reported, as clDice is the most competitive loss function among others.
DRIVE-AV
Loss Sensitivity F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCF (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 65.13 ± 1.85 68.4 ± 1.42 53.29 ± 1.69 3.18 ± 0.13 10.64 ± 0.84 0.58(0.16–0.77) 0.55(0.12–0.75)
GC (Zheng et al., 2021) 65.94 ± 1.08 69.23 ± 2.05 53.03 ± 2.37 3.27 ± 0.22 23.22 ± 2.34 0.64(0.27–0.81) 0.62(0.25–0.82)
clDice (Shit et al., 2021) 68.78 ± 0.96 70.34 ± 1.11 55.64 ± 1.38 3.05 ± 0.15 10.98 ± 0.92 0.61(0.32–0.78) 0.64(0.28–0.83)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 71.37 ± 0.75 73.22 ± 0.98 58.25 ± 1.21 2.85 ± 0.01 7.92 ± 1.02 𝟎.𝟕𝟖(𝟎.𝟒𝟓–𝟎.𝟗𝟐) 𝟎.𝟕𝟐(𝟎.𝟑𝟔–𝟎.𝟗𝟑)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 73.27 ± 1.24 74.26 ± 0.66 59.5 ± 0.84 2.77 ± 0.06 6.73 ± 0.46 𝟎.𝟕𝟔(𝟎.𝟒𝟐–𝟎.𝟗𝟐) 𝟎.𝟔𝟗(𝟎.𝟑𝟖–𝟎.𝟖𝟔)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 71.93 ± 0.11 73.79 ± 0.81 58.93 ± 1.02 2.78 ± 0.11 7.07 ± 1.32 𝟎.𝟖𝟑(𝟎.𝟓𝟓–𝟎.𝟗𝟏) 𝟎.𝟖𝟔(𝟎.𝟔𝟐–𝟎.𝟗𝟐)
p-value 2.39e−2 1.36e−3 5.38e−3 2.39e−2 1.36e−3 N/A N/A
LES-AV
Loss Sensitivity F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCF (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 59.15 ± 2.57 62.83 ± 2.32 47.4 ± 2.6 2.88 ± 0.16 8.42 ± 0.75 0.67(0.24–0.84) 0.62(0.21–0.82)
GC (Zheng et al., 2021) 60.32 ± 1.98 63.69 ± 1.78 47.99 ± 1.91 2.83 ± 0.14 10.69 ± 2.84 0.67(0.18–0.86) 0.62(0.22–0.8)
clDice (Shit et al., 2021) 61.04 ± 2.49 63.87 ± 1.94 48.55 ± 1.9 2.86 ± 0.12 8.42 ± 1.31 0.66(0.28–0.81) 0.64(0.35–0.77)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 62.21 ± 2.14 65.93 ± 1.32 50.66 ± 1.51 2.61 ± 0.25 4.76 ± 1.15 𝟎.𝟕𝟐(𝟎.𝟑𝟑–𝟎.𝟗𝟒) 𝟎.𝟕𝟏(𝟎.𝟑𝟐–𝟎.𝟗)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 62.47 ± 2.52 66.4 ± 1.74 51.11 ± 2.64 2.56 ± 0.16 5.34 ± 1.15 𝟎.𝟔𝟗(𝟎.𝟐𝟐–𝟎.𝟖𝟓) 𝟎.𝟕(𝟎.𝟑𝟖–𝟎.𝟖𝟗)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 61.51 ± 1.56 64.71 ± 1.44 49.7 ± 1.89 2.69 ± 0.18 4.78 ± 1.07 𝟎.𝟕𝟖(𝟎.𝟒𝟕–𝟎.𝟗𝟏) 𝟎.𝟕𝟑(𝟎.𝟒𝟐–𝟎.𝟗)
p-value 1.36e−3 2.17e−3 7.29e−3 2.15e−2 3.13e−3 N/A N/A
HRF-AV
Loss Sensitivity F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCF (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 65.37 ± 1.36 69.93 ± 0.98 55.11 ± 0.88 2.11 ± 0.05 9.31 ± 0.48 0.78(0.59–0.85) 0.72(0.39–0.85)
GC (Zheng et al., 2021) 66.75 ± 1.21 70.48 ± 0.63 55.73 ± 0.82 2.18 ± 0.07 12.84 ± 2.4 0.74(0.51–0.83) 0.69(0.36–0.84)
clDice (Shit et al., 2021) 67.55 ± 1.92 70.14 ± 1.04 56.12 ± 0.67 2.13 ± 0.04 9.39 ± 0.46 0.78(0.62–0.83) 0.73(0.63–0.92)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 69.41 ± 1.75 72.17 ± 0.66 57.74 ± 0.73 1.91 ± 0.02 6.61 ± 0.52 𝟎.𝟖𝟐(𝟎.𝟔𝟖–𝟎.𝟗𝟐) 𝟎.𝟖𝟏(𝟎.𝟔𝟓–𝟎.𝟗𝟏)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 69.85 ± 1.7 72.47 ± 0.38 58.06 ± 0.41 1.89 ± 0.05 6.38 ± 0.39 𝟎.𝟕𝟖(𝟎.𝟔𝟓–𝟎.𝟖𝟖) 𝟎.𝟕𝟗(𝟎.𝟓𝟕–𝟎.𝟗𝟓)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 68.89 ± 2.24 71.82 ± 1.14 57.3 ± 0.94 1.93 ± 0.05 6.49 ± 0.69 𝟎.𝟖𝟒(𝟎.𝟕𝟏–𝟎.𝟗𝟓) 𝟎.𝟖𝟑(𝟎.𝟔𝟒–𝟎.𝟗𝟒)
p-value 2.17e−3 1.36e−3 1.36e−3 9.39e−4 3.67e−3 N/A N/A

Table 2 in multi-class vessel segmentation, which highlights that pixel-based


Effects of single loss function on multi-class vessel segmentation performance with U-
segmentation loss, such as CE, is indispensable for segmentation tasks.
Net on DRIVE-AV dataset. 𝐿𝑜𝑠𝑠𝐵 indicates box count loss only and 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 is with
the combination of cross entropy and 𝐿𝑜𝑠𝑠𝐵 , as introduced in Eq. (7).
Considering the mutual benefits, the pixel-based CE and feature-based
Loss F1-score ↑ Betti error↓ ICCF (95%𝐶𝐼) ↑
loss functions appear to be complementary to each other.
𝐿𝑜𝑠𝑠𝐶𝐸 69.14 ± 1.57 18.22 ± 2.63 0.58(0.25–0.72)
𝐿𝑜𝑠𝑠𝐵 61.31 ± 0.71 16.69 ± 1.19 0.52(0.14–0.75) 4.4.6. Segmentation metrics vs. feature agreement
𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 73.22 ± 0.98 7.92 ± 1.02 0.78(0.45–0.92) We identify that high segmentation metrics do not necessarily corre-
spond to accurate vascular features, which breaks the common assump-
tion that purely optimising pixel-based metrics necessarily provide
accurate features. We found three aspects supporting our observation.
size weight, the performance has been moderately improved, which
The first aspect is the network backbone. Comparing Table 1 and
validated that emphasised semantic information helps multi-class vessel
Table 3, U-Net achieved higher pixel-based segmentation metrics, such
segmentation.
as F1-score and IOU, while BF-Net offered higher feature agreement,
ICCF and ICCV on DRIVE-AV dataset. The relationship is also visualised
4.4.5. Ablation 3: Effects of single loss function in Fig. 5. The second aspect is segmentation categories. According to
In Section 4.4.3, we have verified that feature-based loss 𝐿𝑜𝑠𝑠𝑉 and Table 4 and Table 5, arteries showed lower segmentation metrics but
𝐿𝑜𝑠𝑠𝐵 help the model performance. We here study the performance higher ICCF and ICCV than veins in most of the cases. For example,
of a single feature-based loss function to understand the benefits of 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 achieved 5.14% higher F1-score in veins but 13% higher
combining a pixel-based loss. Table 2 shows the distribution of perfor- ICCF in arteries on DRIVE-AV dataset. The final aspect is loss function.
mance achieved by 𝐿𝑜𝑠𝑠𝐵 and 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 . The 𝐿𝑜𝑠𝑠𝐵 alone works poorly In Table 2, 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 showed better segmentation metrics but lower

8
Y. Zhou et al. Medical Image Analysis 93 (2024) 103098

Table 3
Multi-class segmentation performance with BF-Net on DRIVE-AV, LES-AV, and HRF-AV. Betti error evaluates the topological correctness of segmentation maps. ICCF evaluate the
agreement of fractal dimension to that derived from ground-truth maps and ICCV for vessel density. p-value of Mann–Whitney U test between CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 and clDice is
reported, as clDice is the most competitive loss function among others.
DRIVE-AV
Loss Sensitivity ↑ F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCF (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 64.15 ± 2.78 67.31 ± 2.08 52.22 ± 1.97 3.31 ± 0.32 14.04 ± 3.91 0.62(0.31–0.76) 0.61(0.23–0.81)
GC (Zheng et al., 2021) 66.93 ± 2.42 68.87 ± 2.48 52.57 ± 2.63 3.29 ± 0.35 13.96 ± 4.69 0.66(0.26–0.82) 0.68(0.35–0.76)
clDice (Shit et al., 2021) 68.31 ± 1.88 70.27 ± 1.1 54.55 ± 1.36 3.22 ± 0.18 11.48 ± 1.54 0.71(0.28–0.88) 0.69(0.26–0.87)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 72.91 ± 1.27 73.04 ± 0.58 57.99 ± 0.7 2.93 ± 0.06 7.75 ± 1.21 𝟎.𝟖𝟒(𝟎.𝟔𝟒–𝟎.𝟗𝟐) 𝟎.𝟕𝟖(𝟎.𝟓𝟔–𝟎.𝟗𝟐)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 71.93 ± 1.16 73.09 ± 0.55 58.14 ± 0.65 2.88 ± 0.09 7.91 ± 1.34 𝟎.𝟖(𝟎.𝟓𝟒–𝟎.𝟗𝟒) 𝟎.𝟕𝟒(𝟎.𝟓𝟏–𝟎.𝟖𝟖)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 72.29 ± 2.03 73.17 ± 1.14 58.18 ± 1.38 2.89 ± 0.12 7.07 ± 0.93 𝟎.𝟖𝟐(𝟎.𝟓𝟒–𝟎.𝟗) 𝟎.𝟕𝟔(𝟎.𝟒𝟖–𝟎.𝟖𝟗)
p-value 1.36e−3 1.36e−3 1.36e−3 3.87e−3 5.07e−3 N/A N/A
LES-AV
Loss Sensitivity F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCF (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 63.66 ± 2.17 66.15 ± 2.45 51.09 ± 2.8 2.63 ± 0.27 9.3 ± 2.86 0.69(0.33–0.94) 0.72(0.26–0.94)
GC (Zheng et al., 2021) 62.85 ± 2.08 65.88 ± 2.58 50.18 ± 2.82 2.68 ± 0.18 6.76 ± 2.58 0.83(0.61–0.92) 0.83(0.57–0.94)
clDice (Shit et al., 2021) 64.16 ± 1.99 67.2 ± 2.23 51.87 ± 2.3 2.5 ± 0.19 4.51 ± 0.69 0.84(0.58–0.94) 0.83(0.61–0.92)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 67.06 ± 1.76 69.87 ± 1.56 54.98 ± 1.61 2.32 ± 0.1 3.04 ± 0.66 𝟎.𝟗𝟐(𝟎.𝟖𝟑–𝟎.𝟗𝟕) 𝟎.𝟗𝟓(𝟎.𝟖𝟖–𝟎.𝟗𝟖)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 65.85 ± 1.81 69.05 ± 1.45 54.15 ± 1.59 2.35 ± 0.13 3.12 ± 0.71 𝟎.𝟗𝟏(𝟎.𝟖–𝟎.𝟗𝟕) 𝟎.𝟗𝟑(𝟎.𝟖𝟑–𝟎.𝟗𝟑)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 66.83 ± 1.24 69.89 ± 0.9 54.95 ± 0.87 2.31 ± 0.06 2.71 ± 0.59 𝟎.𝟖𝟖(𝟎.𝟕𝟐–𝟎.𝟗𝟓) 𝟎.𝟗(𝟎.𝟕𝟖–𝟎.𝟗𝟔)
p-value 1.36e−3 1.95e−3 9.39e−4 5.38e−3 3.06e−3 N/A N/A
HRF-AV
Loss Sensitivity F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCA (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 65.19 ± 2.79 68.22 ± 0.8 53.09 ± 0.73 2.26 ± 0.04 10.81 ± 3.59 0.78(0.43–0.92) 0.81(0.42–0.94)
GC (Zheng et al., 2021) 65.74 ± 2.51 68.67 ± 1.06 53.77 ± 1.28 2.15 ± 0.05 12.93 ± 4.32 0.76(0.41–0.91) 0.77(0.32–0.92)
clDice (Shit et al., 2021) 65.66 ± 2.18 68.83 ± 0.45 54.04 ± 0.57 2.13 ± 0.05 9.48 ± 1.1 0.81(0.38–0.96) 0.79(0.46–0.92)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 67.61 ± 2.48 71.19 ± 0.58 56.48 ± 0.73 1.96 ± 0.03 6.82 ± 1.16 𝟎.𝟖𝟔(𝟎.𝟕𝟏–𝟎.𝟗𝟔) 𝟎.𝟗𝟏(𝟎.𝟖𝟐–𝟎.𝟗𝟔)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 71.14 ± 1.87 72.31 ± 0.71 57.76 ± 0.91 1.96 ± 0.05 7.68 ± 0.9 𝟎.𝟖𝟐(𝟎.𝟔𝟖–𝟎.𝟗𝟏) 𝟎.𝟖𝟔(𝟎.𝟕𝟐–𝟎.𝟗𝟑)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 68.29 ± 2.34 71.39 ± 0.77 56.74 ± 0.94 1.95 ± 0.04 7.11 ± 1.36 𝟎.𝟖𝟑(𝟎.𝟕𝟏–𝟎.𝟗𝟐) 𝟎.𝟖𝟓(𝟎.𝟕𝟔–𝟎.𝟗𝟑)
p-value 5.38e−3 9.39e−4 9.39e−4 9.39e−4 2.76e−3 N/A N/A

ICCF and ICCV than 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 on HRF-AV dataset. When the loss arteries are likely to be better markers for ischaemic stroke prediction
weight 𝜆 and 𝛽 change for 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 , F1-score and ICCF show contrary in the collected database. This is consistent with the findings of recent
general trend, depicted in the third and fourth columns of Fig. 5. work (Sandoval-Garcia et al., 2021).
These three aspects reveal that it is questionable to believe better
segmentation metrics correspond to more accurate vascular features. 4.4.8. Computation efficiency
Similarly, previous work found that topology correctness cannot be MI-Loss requires 𝑂(𝑛2 ) computational complexity for a
ensured by pixel-based segmentation metrics and proposed specific two-dimensional image with a size of 𝑛×𝑛, which is at the same level of
regularisation on topology. Here we further discovered the potential complexity as AC, GC, and clDice. For model training, the proposed MI-
inconsistency of the pixel-based segmentation metrics and vascular fea- Loss 𝐿𝑜𝑠𝑠𝑀𝐼−𝐵 on average took 1.64 s for a batch size of 2 and image
ture agreement, thus highlighting the importance of directly regulating size of (720, 720), while AC, GC, and clDice respectively took 1.87 s,
the features to improve feature accuracy for downstream clinical tasks. 1.91 s, and 2.17 s. The inference time for MI-Loss and compared loss
functions is equal.
4.4.7. Quantitative clinical impact
5. Discussion and conclusion
We use measured vascular features to predict three-year ischaemic
stroke incidence. From Table 1 and Table 3, we can identify that BF-
We introduce a new feature-based loss function that directly en-
Net performed best in feature agreement ICCF and ICCV for all loss
codes computation of downstream vascular features used as clinical
functions, on three datasets. Hence, we used the trained BF-Net to
disease markers, to optimise segmentation networks. We combine the
segment the ischaemic stroke database and then measure the vascular
proposed feature-based loss and pixel-based loss as CF-Loss. Exper-
features, including fractal dimension and vessel density. We train and
imental results show that CF-Loss improved both multi-class vessel
validate a logistic regression with the input of a single vascular feature segmentation and vascular feature measurement, compared to three
and output of stroke incidence. recent loss functions which consider only the interim segmentation
The results with the input of the fractal dimension and vessel density rather than the final feature. In particular, the combination of 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵
are shown in Fig. 6. The best prediction performance is achieved and BF-Net achieved the best results in feature measurement. With CF-
with the artery fractal dimension. AUC-ROC achieved 0.7 and AUC-PR Loss, both artery and vein segmentation have been improved, as shown
achieved 0.65 with CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 , outperforming those of the other in Table 4 and Table 5. We also conduct three ablation study to show
three compared loss functions, depicted in Fig. 6(a). This demonstrates the efficacy of designed components, including single loss function and
that CF-Loss quantitatively improved downstream task performance box size weight. The analysis of segmentation metrics vs. feature agree-
with accurate vascular features. The logistic regression model with in- ment (Section 4.4.6) highlights the significance of optimising features
put of artery vessel density also achieved AUC-ROC of 0.69 (Fig. 6(b)), directly, with CF-Loss proposed in this work. For the loss weight 𝜆
clearly outperforming the models based on other loss functions. For and 𝛽 respectively for 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 and 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 , there appears to be
the vein fractal dimension and vessel density, we observe that ROC a trade-off of further optimising pixel-based segmentation metrics and
and PR curves from the four loss functions largely overlap, as shown vascular feature agreement. In scenarios that require accurate features
in Fig. 6(c) and 6(d), despite the clear difference in veins’ features in for clinical downstream tasks, such as the ischaemic stroke prediction,
Table 4 and Table 5. This suggests that ischaemic stroke prediction is the 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 is preferred.
not sensitive to veins’ fractal dimension and vessel density. Consider- Future work will focus on investigating the combination strategy of
ing that arteries’ features contribute to better prediction performance, multiple feature-based loss functions, as well as the principle behind

9
Y. Zhou et al. Medical Image Analysis 93 (2024) 103098

Table 4
Artery and vein segmentation performance with U-Net on DRIVE-AV, LES-AV, and HRF-AV.
Evaluation category: artery
DRIVE-AV
Loss Sensitivity F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCF (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 64.42 ± 1.55 66.91 ± 2.09 51.37 ± 1.41 3.08 ± 0.85 8.84 ± 1.16 0.74(0.36–0.9) 0.73(0.35–0.88)
GC (Zheng et al., 2021) 64.73 ± 1.98 67.35 ± 2.39 52.29 ± 1.46 3.15 ± 0.62 17.21 ± 2.18 0.77(0.44–0.91) 0.75(0.38–0.88)
clDice (Shit et al., 2021) 65.63 ± 2.36 68.37 ± 1.76 53.39 ± 1.89 2.97 ± 0.29 6.9 ± 1.56 0.75(0.36–0.89) 0.77(0.54–0.92)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 67.71 ± 2.47 71.44 ± 1.27 55.78 ± 1.52 2.77 ± 0.16 4.91 ± 1.67 𝟎.𝟖(𝟎.𝟓𝟐–𝟎.𝟗𝟐) 𝟎.𝟖𝟖(𝟎.𝟕𝟏–𝟎.𝟗𝟓)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 70.56 ± 1.35 72.74 ± 0.84 57.33 ± 1.04 2.71 ± 0.07 4.23 ± 0.65 𝟎.𝟕𝟖(𝟎.𝟒𝟗–𝟎.𝟗𝟑) 𝟎.𝟖𝟓(𝟎.𝟔𝟖–𝟎.𝟗𝟐)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 69.38 ± 1.41 72.21 ± 1.06 56.69 ± 1.28 2.73 ± 0.15 4.84 ± 2.52 𝟎.𝟖𝟕(𝟎.𝟔𝟖–𝟎.𝟗𝟓) 𝟎.𝟗(𝟎.𝟕𝟔–𝟎.𝟗𝟔)
p-value 1.81e−2 1.36e−3 2.39e−3 9.39e−4 1.36e−3 N/A N/A
LES-AV
Loss Sensitivity F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCF (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 58.32 ± 2.66 61.83 ± 1.63 46.64 ± 1.83 2.75 ± 0.58 7.38 ± 2.56 0.66(0.31–0.82) 0.58(0.15–0.75)
GC (Zheng et al., 2021) 58.45 ± 2.59 61.83 ± 1.79 46.99 ± 1.29 2.78 ± 0.35 7.84 ± 2.37 0.67(0.12–0.89) 0.58(0.2–0.76)
clDice (Shit et al., 2021) 59.15 ± 1.28 62.14 ± 1.22 47.21 ± 2.04 2.72 ± 0.11 7.53 ± 1.68 0.65(0.24–0.83) 0.59(0.21–0.78)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 61.2 ± 2.63 64.06 ± 1.78 48.25 ± 1.94 2.52 ± 0.08 3.87 ± 1.24 𝟎.𝟕𝟐(𝟎.𝟒–𝟎.𝟗𝟐) 𝟎.𝟔𝟖(𝟎.𝟑𝟑–𝟎.𝟖𝟒)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 62.49 ± 2.28 64.85 ± 1.38 49.04 ± 1.38 2.44 ± 0.12 3.95 ± 1.07 𝟎.𝟔𝟕(𝟎.𝟑𝟔–𝟎.𝟖𝟒) 𝟎.𝟔𝟓(𝟎.𝟑𝟑–𝟎.𝟖𝟓)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 61.04 ± 1.68 62.92 ± 1.34 47.1 ± 1.03 2.61 ± 0.14 3.75 ± 0.55 𝟎.𝟕𝟕(𝟎.𝟑𝟔–𝟎.𝟗𝟒) 𝟎.𝟕𝟏(𝟎.𝟑𝟐–𝟎.𝟗𝟐)
p-value 1.81e−2 1.36e−2 2.39e−2 3.88e−3 2.76e−3 N/A N/A
HRF-AV
Loss Sensitivity F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCF (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 64.87 ± 2.55 69.61 ± 1.25 54.78 ± 1.39 2.23 ± 0.12 8.17 ± 0.67 0.87(0.69–0.95) 0.78(0.46–0.92)
GC (Zheng et al., 2021) 65.79 ± 2.21 70.25 ± 1.13 55.29 ± 1.48 2.26 ± 0.09 10.67 ± 1.98 0.88(0.8–0.95) 0.73(0.36–0.9)
clDice (Shit et al., 2021) 66.93 ± 1.78 70.38 ± 1.29 55.52 ± 0.88 2.24 ± 0.08 8.24 ± 0.92 0.88(0.82–0.95) 0.79(0.46–0.94)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 69.32 ± 1.95 72.61 ± 0.66 57.2 ± 0.79 2.03 ± 0.04 5.6 ± 0.53 𝟎.𝟗𝟏(𝟎.𝟕𝟗–𝟎.𝟗𝟕) 𝟎.𝟖𝟔(𝟎.𝟔𝟕–𝟎.𝟗𝟓)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 69.54 ± 2.03 72.84 ± 0.42 57.49 ± 0.51 2.01 ± 0.05 5.11 ± 0.85 𝟎.𝟖𝟏(𝟎.𝟒𝟕–𝟎.𝟖𝟓) 𝟎.𝟖𝟒(𝟎.𝟕𝟗–𝟎.𝟗𝟔)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 68.83 ± 2.22 72.24 ± 1.14 56.76 ± 1.5 2.05 ± 0.06 5.55 ± 0.45 𝟎.𝟗𝟐(𝟎.𝟖𝟏–𝟎.𝟗𝟕) 𝟎.𝟗𝟑(𝟎.𝟖𝟒–𝟎.𝟗𝟕)
p-value 3.88e−3 1.01e−2 2.39e−2 9.39e−4 3.88e−3 N/A N/A
Evaluation category: vein
DRIVE-AV
Loss Sensitivity F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCF (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 71.25 ± 1.66 71.01 ± 1.61 56.47 ± 1.85 3.37 ± 0.12 13.43 ± 0.89 0.52(0.12–0.82) 0.45(0.08–0.76)
GC (Zheng et al., 2021) 71.82 ± 1.81 72.43 ± 1.7 57.51 ± 1.65 3.49 ± 0.09 25.69 ± 2.27 0.58(0.28–0.86) 0.46(0.12–0.75)
clDice (Shit et al., 2021) 73.24 ± 1.18 73.71 ± 1.53 59.47 ± 1.65 3.31 ± 0.11 13.37 ± 1.42 0.55(0.28–0.82) 0.48(0.21–0.88)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 77.18 ± 0.65 76.58 ± 0.76 62.12 ± 1.07 3.18 ± 0.09 11.4 ± 0.89 𝟎.𝟔𝟕(𝟎.𝟑𝟔–𝟎.𝟖𝟕) 𝟎.𝟓𝟒(𝟎.𝟐𝟖–𝟎.𝟖𝟐)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 78.04 ± 1.61 77.31 ± 0.56 63.13 ± 0.74 2.98 ± 0.06 9.27 ± 0.82 𝟎.𝟔𝟔(𝟎.𝟏𝟓–𝟎.𝟖𝟕) 𝟎.𝟓𝟔(𝟎.𝟑𝟐–𝟎.𝟖𝟐)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 76.45 ± 0.48 76.92 ± 1.01 62.61 ± 1.32 2.98 ± 0.11 9.43 ± 1.71 𝟎.𝟔𝟗(𝟎.𝟒𝟏–𝟎.𝟖𝟖) 𝟎.𝟓𝟖(𝟎.𝟑𝟑–𝟎.𝟖𝟓)
p-value 9.39e−4 9.39e−4 5.38e−3 1.95e−3 7.41e−3 N/A N/A
LES-AV
Loss Sensitivity F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCF (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 60.29 ± 2.72 63.79 ± 2.53 49.47 ± 2.49 3.15 ± 0.18 13.11 ± 1.28 0.67(0.31–0.82) 0.65(0.23–0.79)
GC (Zheng et al., 2021) 61.77 ± 2.35 64.38 ± 1.94 50.75 ± 1.77 3.08 ± 0.29 26.29 ± 2.72 0.68(0.12–0.89) 0.66(0.24–0.83)
clDice (Shit et al., 2021) 61.86 ± 2.48 65.67 ± 1.48 50.7 ± 1.63 3.14 ± 0.13 14.18 ± 1.34 0.67(0.24–0.83) 0.68(0.32–0.81)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 63.39 ± 1.89 68.39 ± 1.37 53.37 ± 1.61 2.71 ± 0.23 5.27 ± 1.68 𝟎.𝟕𝟑(𝟎.𝟑𝟒–𝟎.𝟗𝟏) 𝟎.𝟕𝟒(𝟎.𝟐𝟖–𝟎.𝟗𝟒)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 63.59 ± 2.52 68.57 ± 2.78 53.56 ± 2.64 2.69 ± 0.19 6.5 ± 1.21 𝟎.𝟕(𝟎.𝟐𝟓–𝟎.𝟖𝟗) 𝟎.𝟕𝟓(𝟎.𝟐𝟏–𝟎.𝟖𝟖)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 61.84 ± 1.23 66.21 ± 1.75 51.99 ± 1.59 2.94 ± 0.12 5.67 ± 1.64 𝟎.𝟕𝟖(𝟎.𝟐𝟒–𝟎.𝟗𝟒) 𝟎.𝟕𝟓(𝟎.𝟏𝟔–𝟎.𝟗𝟏)
p-value 2.39e−2 7.41e−3 1.01e−2 1.36e−3 9.39e−4 N/A N/A
HRF-AV
Loss Sensitivity F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCF (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 71.93 ± 0.98 74.93 ± 0.98 60.±0.88 2.0 ± 0.05 11.31 ± 0.48 0.62(0.22–0.78) 0.61(0.18–0.82)
GC (Zheng et al., 2021) 72.93 ± 0.98 75.48 ± 0.63 61.73 ± 0.82 2.2 ± 0.07 14.84 ± 2.4 0.58(0.24–0.74) 0.62(0.26–0.76)
clDice (Shit et al., 2021) 73.93 ± 0.98 75.14 ± 1.04 61.12 ± 0.67 2.27 ± 0.04 11.39 ± 0.46 0.64(0.21–0.82) 0.66(0.26–0.81)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 76.23 ± 1.93 77.72 ± 0.47 63.67 ± 0.62 1.96 ± 0.02 8.34 ± 0.96 𝟎.𝟕𝟏(𝟎.𝟑𝟏–𝟎.𝟖𝟗) 𝟎.𝟕(𝟎.𝟑𝟓–𝟎.𝟖𝟖)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 76.73 ± 1.69 77.89 ± 0.36 63.92 ± 0.45 1.95 ± 0.05 8.24 ± 0.33 𝟎.𝟔𝟖(𝟎.𝟐𝟗–𝟎.𝟖𝟖) 𝟎.𝟕𝟔(𝟎.𝟒𝟐–𝟎.𝟗)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 75.65 ± 2.45 77.37 ± 0.96 63.23 ± 1.25 1.98 ± 0.06 8.15 ± 0.67 𝟎.𝟕𝟑(𝟎.𝟑𝟑–𝟎.𝟖𝟗) 𝟎.𝟕𝟖(𝟎.𝟒𝟔–𝟎.𝟗𝟏)
p-value 9.39e−4 1.36e−3 9.39e−4 9.39e−4 1.36e−3 N/A N/A

considering that simple summation of 𝐿𝑜𝑠𝑠𝐵 and 𝐿𝑜𝑠𝑠𝑉 in Eq. (9) did that multi-size view in box counts of 𝐿𝑜𝑠𝑠𝐵 enhances the network’s abil-
not provide clear extra benefit. This raises a few investigation interests. ity to learn multi-scale features. Beyond our specific application to two
First, a more sophisticated integration might be required to maximise vascular markers from retinal fundus photographs, we will explore en-
the effects of each component, considering the linear combination coding additional features into this clinically-relevant feature optimised
loss function. Although we show improvements in the accuracy of some
cannot further help the performance. The compatibility of the specific
features such as vessel density and fractal dimension, some tortuosity
format of CF-Loss and network architecture is also worthy of explo-
and calibre metrics might offer extra regularisation to model training.
ration. Although Tables 1 and 3 show that all combinations of CF-Loss This will also help explore the best disease markers among a wide
and architectures produce better performance than the compared loss range of vascular features. In stroke incidence prediction, we have not
functions, the box count loss 𝐿𝑜𝑠𝑠𝐵 particularly cooperates well with adjusted logistic regression model to demographic factors as our target
the BF-Net, an information fusion architecture. One reason might be is to compare the derived vascular features from CF-Loss and recent loss

10
Y. Zhou et al. Medical Image Analysis 93 (2024) 103098

Table 5
Artery and vein segmentation performance with BF-Net on DRIVE-AV, LES-AV, and HRF-AV.
Evaluation category: artery
DRIVE-AV
Loss Sensitivity ↑ F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCF (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 61.85 ± 2.19 65.16 ± 2.12 50.15 ± 1.62 3.21 ± 0.26 10.27 ± 2.58 0.64(0.09–0.86) 0.68(0.32–0.84)
GC (Zheng et al., 2021) 64.04 ± 2.77 66.71 ± 2.82 50.09 ± 2.38 3.16 ± 0.13 10.42 ± 2.75 0.71(0.27–0.89) 0.74(0.45–0.91)
clDice (Shit et al., 2021) 64.63 ± 2.54 68.39 ± 1.62 52.43 ± 1.47 3.12 ± 0.14 8.57 ± 1.18 0.74(0.45–0.9) 0.77(0.49–0.92)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 68.45 ± 2.47 71.11 ± 0.93 55.35 ± 1.12 2.85 ± 0.09 4.5 ± 1.25 𝟎.𝟖𝟓(𝟎.𝟔𝟐–𝟎.𝟗𝟒) 𝟎.𝟖𝟐(𝟎.𝟓𝟕–𝟎.𝟗𝟑)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 68.75 ± 1.98 71.86 ± 0.51 56.25 ± 0.63 2.76 ± 0.07 4.7 ± 1.06 𝟎.𝟖𝟏(𝟎.𝟓𝟐–𝟎.𝟗𝟑) 𝟎.𝟕𝟗(𝟎.𝟒𝟗–𝟎.𝟗𝟓)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 69.48 ± 1.64 71.33 ± 1.34 55.62 ± 1.58 2.86 ± 0.11 4.8 ± 0.55 𝟎.𝟖𝟒(𝟎.𝟔𝟐–𝟎.𝟗𝟑) 𝟎.𝟖𝟐(𝟎.𝟓𝟗–𝟎.𝟗𝟏)
p-value 9.39e−4 9.39e−4 7.41e−3 1.36e−3 9.39e−4 N/A N/A
LES-AV
Loss Sensitivity F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCF (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 64.07 ± 2.16 65.49 ± 2.28 50.31 ± 1.92 2.57 ± 0.18 9.41 ± 2.67 0.68(0.3–0.92) 0.68(0.28–0.79)
GC (Zheng et al., 2021) 63.24 ± 2.09 64.18 ± 1.58 49.53 ± 1.28 2.53 ± 0.22 7.34 ± 2.18 0.83(0.56–0.95) 0.81(0.54–0.92)
clDice (Shit et al., 2021) 65.39 ± 2.08 65.57 ± 2.48 50.48 ± 1.63 2.46 ± 0.09 5.54 ± 0.36 0.82(0.47–0.96) 0.82(0.64–0.91)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 68.11 ± 1.96 68.43 ± 1.34 52.94 ± 1.4 2.28 ± 0.1 3.48 ± 0.83 𝟎.𝟖𝟔(𝟎.𝟓𝟏–𝟎.𝟗𝟔) 𝟎.𝟗𝟏(𝟎.𝟔𝟕–𝟎.𝟗𝟖)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 66.69 ± 2.62 67.53 ± 1.45 51.91 ± 1.69 2.33 ± 0.19 3.85 ± 1.62 𝟎.𝟖𝟓(𝟎.𝟓𝟐–𝟎.𝟗𝟑) 𝟎.𝟖𝟗(𝟎.𝟔𝟒–𝟎.𝟗𝟓)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 66.4 ± 1.89 68.17 ± 0.99 52.52 ± 1.03 2.26 ± 0.07 2.81 ± 0.8 𝟎.𝟖𝟓(𝟎.𝟒𝟔–𝟎.𝟗𝟔) 𝟎.𝟗(𝟎.𝟔𝟑–𝟎.𝟗𝟕)
p-value 1.95e−3 3.88e−3 3.13e−2 1.36e−2 9.39e−4 N/A N/A
HRF-AV
Loss Sensitivity F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCA (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 65.45 ± 2.19 67.43 ± 1.51 52.17 ± 2.05 2.33 ± 0.09 7.76 ± 3.15 0.86(0.68–0.95) 0.87(0.72–0.94)
GC (Zheng et al., 2021) 65.49 ± 2.66 68.25 ± 1.65 53.48 ± 1.75 2.27 ± 0.12 9.32 ± 2.36 0.83(0.59–0.93) 0.82(0.61–0.94)
clDice (Shit et al., 2021) 65.53 ± 1.48 68.15 ± 1.83 53.23 ± 1.22 2.29 ± 0.08 7.43 ± 1.04 0.89(0.74–0.96) 0.86(0.7–0.96)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 67.83 ± 2.53 71.42 ± 1.03 55.75 ± 1.24 2.11 ± 0.03 5.47 ± 0.78 𝟎.𝟗𝟐(𝟎.𝟖–𝟎.𝟗𝟕) 𝟎.𝟗𝟓(𝟎.𝟖𝟗–𝟎.𝟗𝟖)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 71.32 ± 1.89 72.72 ± 0.81 57.3 ± 0.99 2.09 ± 0.06 5.8 ± 0.72 𝟎.𝟖𝟗(𝟎.𝟕𝟑–𝟎.𝟗𝟔) 𝟎.𝟗𝟏(𝟎.𝟕𝟗–𝟎.𝟗𝟕)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 67.7 ± 1.48 71.59 ± 1.32 55.96 ± 1.58 2.08 ± 0.06 5.18 ± 1.41 𝟎.𝟗(𝟎.𝟕𝟓–𝟎.𝟗𝟔) 𝟎.𝟖𝟗(𝟎.𝟕𝟒–𝟎.𝟗𝟔)
p-value 5.38e−3 1.36e−3 3.88e−3 1.36e−3 1.95e−3 N/A N/A
Evaluation category: vein
DRIVE-AV
Loss Sensitivity ↑ F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCF (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 70.14 ± 2.49 70.17 ± 2.43 55.61 ± 1.76 3.65 ± 0.23 16.23 ± 2.38 0.61(0.22–0.81) 0.53(0.14–0.82)
GC (Zheng et al., 2021) 72.95 ± 1.42 71.58 ± 2.25 55.73 ± 1.38 3.57 ± 0.14 15.17 ± 2.93 0.63(0.28–0.84) 0.62(0.23–0.78)
clDice (Shit et al., 2021) 74.66 ± 2.39 73.41 ± 1.25 57.68 ± 1.32 3.52 ± 0.15 13.21 ± 2.29 0.65(0.31–0.87) 0.62(0.18–0.85)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 78.98 ± 2.48 76.44 ± 0.4 61.98 ± 0.53 3.16 ± 0.09 10.98 ± 2.77 𝟎.𝟖𝟐(𝟎.𝟓𝟕–𝟎.𝟗𝟐) 𝟎.𝟕𝟏(𝟎.𝟒𝟖–𝟎.𝟖𝟖)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 77.61 ± 1.36 76.33 ± 0.54 61.83 ± 0.72 3.13 ± 0.12 11.14 ± 2.06 𝟎.𝟕𝟗(𝟎.𝟒𝟗–𝟎.𝟗𝟐) 𝟎.𝟔𝟖(𝟎.𝟐𝟗–𝟎.𝟖𝟕)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 77.21 ± 2.31 76.56 ± 0.95 62.15 ± 1.24 3.07 ± 0.15 9.45 ± 1.95 𝟎.𝟖(𝟎.𝟒𝟗–𝟎.𝟗𝟐) 𝟎.𝟔𝟗(𝟎.𝟑𝟒–𝟎.𝟖𝟓)
p-value 1.36e−3 9.39e−4 9.39e−4 1.36e−2 1.36e−2 N/A N/A
LES-AV
Loss Sensitivity F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCF (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 63.43 ± 2.36 68.19 ± 1.97 53.34 ± 1.88 2.85 ± 0.14 8.23 ± 2.38 0.72(0.44–0.88) 0.74(0.48–0.91)
GC (Zheng et al., 2021) 62.13 ± 2.12 67.98 ± 1.78 53.21 ± 1.73 2.76 ± 0.17 6.29 ± 2.06 0.85(0.76–0.92) 0.84(0.73–0.91)
clDice (Shit et al., 2021) 64.18 ± 1.91 69.34 ± 1.45 54.54 ± 1.89 2.68 ± 0.08 4.45 ± 0.94 0.87(0.78–0.94) 0.86(0.76–0.93)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 67.42 ± 2.73 71.96 ± 1.87 57.46 ± 1.34 2.4 ± 0.11 2.73 ± 0.74 𝟎.𝟗𝟖(𝟎.𝟗𝟑–𝟎.𝟗𝟗) 𝟎.𝟗𝟔(𝟎.𝟖𝟓–𝟎.𝟗𝟗)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 66.29 ± 2.49 71.19 ± 1.69 56.75 ± 1.78 2.42 ± 0.09 2.59 ± 0.45 𝟎.𝟗𝟒(𝟎.𝟕𝟖–𝟎.𝟗𝟖) 𝟎.𝟗𝟐(𝟎.𝟕𝟑–𝟎.𝟗𝟖)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 68.39 ± 1.16 72.29 ± 0.89 57.78 ± 0.82 2.39 ± 0.07 2.66 ± 0.5 𝟎.𝟗𝟏(𝟎.𝟔𝟔–𝟎.𝟗𝟖) 𝟎.𝟗𝟏(𝟎.𝟔𝟖–𝟎.𝟗𝟖)
p-value 2.76e−3 1.95e−3 1.95e−3 9.39e−4 3.88e−3 N/A N/A
HRF-AV
Loss Sensitivity F1-score ↑ IOU ↑ MSE ↓ Betti error↓ ICCA (95%𝐶𝐼) ↑ ICCV (95%𝐶𝐼) ↑
AC (Chen et al., 2019) 71.51 ± 2.25 72.36 ± 1.36 58.24 ± 0.97 2.36 ± 0.04 12.03 ± 1.39 0.69(0.33–0.84) 0.71(0.22–0.85)
GC (Zheng et al., 2021) 71.17 ± 2.48 72.36 ± 1.58 59.27 ± 1.15 2.2 ± 0.05 15.02 ± 2.12 0.68(0.28–0.87) 0.69(0.42–0.84)
clDice (Shit et al., 2021) 71.26 ± 2.64 73.54 ± 0.83 59.19 ± 1.32 2.2 ± 0.05 11.49 ± 1.7 0.72(0.41–0.89) 0.71(0.36–0.92)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝐵 74.14 ± 2.26 76.89 ± 0.59 62.57 ± 0.77 1.99 ± 0.03 8.91 ± 0.82 𝟎.𝟕𝟕(𝟎.𝟒𝟒–𝟎.𝟗𝟏) 𝟎.𝟖𝟐(𝟎.𝟓𝟓–𝟎.𝟗𝟑)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 77.79 ± 2.86 77.61 ± 0.9 63.53 ± 1.16 2.02 ± 0.05 10.27 ± 1.55 𝟎.𝟕𝟐(𝟎.𝟑𝟏–𝟎.𝟖𝟗) 𝟎.𝟕𝟐(𝟎.𝟑𝟐–𝟎.𝟖𝟗)
CF-Loss 𝐿𝑜𝑠𝑠𝐶𝐹 −𝑉 𝐵 75.58 ± 2.63 77.1 ± 0.59 62.86 ± 0.77 2.01 ± 0.07 9.72 ± 2.18 𝟎.𝟔𝟕(𝟎.𝟐𝟗–𝟎.𝟖𝟕) 𝟎.𝟕𝟏(𝟎.𝟑–𝟎.𝟖𝟗)
p-value 4.06e−2 9.39e−4 1.36e−3 9.39e−4 2.76e−3 N/A N/A

functions. This will be studied in future statistical research. Addition- downstream vessel branches based on the knowledge of upstream
ally, we believe that embedding medical and physical prior knowledge vessels, which is hardly achieved even by state-of-the-art methods. The
of vessel characteristics, morphology and bifurcation in deep learning formulation and integration of medical and physical prior knowledge
models may further benefit multi-class vessel segmentation. The prior in model development will boost the accuracy of feature measurement
knowledge of morphological feature distribution observed on large- and multi-class segmentation.
scale data, such as volume prior in cardiac imaging (Kervadec et al., More broadly, our results motivate a wider use of feature-based loss
2019), can work as effective regularisations for deep learning model functions in medical image computing tasks, as feature measurement
training. While machine learning-based models are more sensitive to provides a quantitative and interpretable reference for disease diagno-
the difference of pixel density between several categories, physicians sis, progression monitoring, and disease marker discovery in various
normally have much stronger logic in distinguishing multi-class vessels. medical fields, such as the anatomical volumes in the brains, lesion
For example, physicians can infer the category of some challenging load in the prostate, and fibrosis density in the lungs. By building

11
Y. Zhou et al. Medical Image Analysis 93 (2024) 103098

the feature-based loss function to regulate the model training, more Chua, J., Chin, C.W.L., Hong, J., Chee, M.L., Le, T.T., Ting, D.S.W., Wong, T.Y.,
accurate clinically-relevant features can be obtained compared to using Schmetterer, L., 2019. Impact of hypertension on retinal capillary microvasculature
using optical coherence tomographic angiography. J. Hypertens. 37 (3), 572.
pixel-based loss functions alone.
Dashtbozorg, B., Mendonça, A.M., Campilho, A., 2013. An automatic graph-based
CF-Loss contributes to downstream clinical research, such as ocu-
approach for artery/vein classification in retinal images. IEEE Trans. Image Process.
lomics, and potentially promotes the deployment of automated AI 23 (3), 1073–1083.
techniques in clinical applications. Beyond an observation of visible De Fauw, J., Ledsam, J.R., Romera-Paredes, B., Nikolov, S., Tomasev, N., Blackwell, S.,
pathologies, such as exudates and haemorrhages, vascular features can Askham, H., Glorot, X., O’Donoghue, B., Visentin, D., et al., 2018. Clinically
reveal imperceptible morphology alteration which potentially implies applicable deep learning for diagnosis and referral in retinal disease. Nature Med.
24 (9), 1342–1350.
circulatory and metabolic dysfunction. By integrating our method into
Estrada, R., Allingham, M.J., Mettu, P.S., Cousins, S.W., Tomasi, C., Farsiu, S., 2015.
the clinical daily routine, both vascular features and multi-class vessel
Retinal artery-vein classification via topology estimation. IEEE Trans. Med. Imaging
maps can be supplied as an auxiliary tool to dig implicit disease- 34 (12), 2518–2534.
related information. To this end, further future work will evaluate the Falconer, K., 2004. Fractal Geometry: Mathematical Foundations and Applications. John
robustness of CF-Loss on large-scale clinical datasets, such as AlzEye Wiley & Sons.
data, and a wider range of disease diagnosis tasks. Fraz, M.M., Welikala, R., Rudnicka, A.R., Owen, C.G., Strachan, D., Barman, S.A., 2015.
QUARTZ: Quantitative analysis of retinal vessel topology and size–an automated
system for quantification of retinal vessels morphology. Expert Syst. Appl. 42 (20),
Declaration of competing interest 7221–7234.
Galdran, A., Anjos, A., Dolz, J., Chakor, H., Lombaert, H., Ayed, I.B., 2022. State-
The authors declare that they have no known competing finan- of-the-art retinal vessel segmentation with minimalistic models. Sci. Rep. 12 (1),
cial interests or personal relationships that could have appeared to 1–13.
Galdran, A., Meyer, M., Costa, P., Campilho, A., et al., 2019. Uncertainty-aware
influence the work reported in this paper.
artery/vein classification on retinal images. In: 2019 IEEE 16th International
Symposium on Biomedical Imaging. ISBI, 2019, IEEE, pp. 556–560.
Data availability Hemelings, R., Elen, B., Stalmans, I., Van Keer, K., De Boever, P., Blaschko, M.B., 2019.
Artery–vein segmentation in fundus images using a fully convolutional network.
Comput. Med. Imaging Graph. 76, 101636.
Data for multi-class vessel segmentation is publicly available. Data
Hu, Q., Abràmoff, M.D., Garvin, M.K., 2013. Automated separation of binary over-
for clinical experiment is not allowed to share.
lapping trees in low-contrast color retinal images. In: International Conference
on Medical Image Computing and Computer-Assisted Intervention. Springer, pp.
Acknowledgements 436–443.
Hu, X., Li, F., Samaras, D., Chen, C., 2019. Topology-preserving deep image
segmentation. Adv. Neural Inf. Process. Syst. 32.
This work is supported by EPSRC, United Kingdom grants EP/
Huang, F., Dashtbozorg, B., ter Haar Romeny, B.M., 2018. Artery/vein classification
M020533/1 EP/R014019/1 and EP/V034537/1 as well as the NIHR using reflection features in retina fundus images. Mach. Vis. Appl. 29 (1), 23–34.
UCLH Biomedical Research Centre, United Kingdom. Dr Keane is sup- Joshi, V.S., Reinhardt, J.M., Garvin, M.K., Abramoff, M.D., 2014. Automated method
ported by a Moorfields Eye Charity Career Development Award, United for identification and artery-venous classification of vessel trees in retinal vessel
Kingdom (R190028A) and a UK Research & Innovation Future Leaders networks. PLoS One 9 (2), e88061.
Fellowship, United Kingdom (MR/T019050/1). Kervadec, H., Dolz, J., Tang, M., Granger, E., Boykov, Y., Ayed, I.B., 2019.
Constrained-CNN losses for weakly supervised segmentation. Med. Image Anal. 54,
88–99.
References Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Akil, H., Huang, A.S., Francis, B.A., Sadda, S.R., Chopra, V., 2017. Retinal vessel density Li, L., Verma, M., Nakashima, Y., Kawasaki, R., Nagahara, H., 2020. Joint learning of
from optical coherence tomography angiography to differentiate early glaucoma, vessel segmentation and artery/vein classification with post-processing. In: Medical
pre-perimetric glaucoma and normal eyes. PLoS One 12 (2), e0170476. Imaging with Deep Learning.
Budai, A., Bock, R., Maier, A., Hornegger, J., Michelson, G., 2013. Robust vessel Li, M., Zhao, J., Zhang, Y., He, D., Zhou, J., Jia, J., She, H., Li, Q., Zhang, L.,
segmentation in fundus images. Int. J. Biomed. Imaging 2013. 2019. Automated classification of arterioles and venules for retina fundus images
Chang, R., Nelson, A.J., LeTran, V., Vu, B., Burkemper, B., Chu, Z., Fard, A., using dual deeply-supervised network. In: International Workshop on Multiscale
Kashani, A.H., Xu, B.Y., Wang, R.K., et al., 2019. Systemic determinants of Multimodal Medical Imaging. Springer, pp. 59–67.
peripapillary vessel density in healthy African Americans: the African American Liew, G., Mitchell, P., Rochtchina, E., Wong, T.Y., Hsu, W., Lee, M.L., Wainwright, A.,
eye disease study. Am. J. Ophthalmol. 207, 240–247. Wang, J.J., 2011. Fractal analysis of retinal microvasculature and coronary heart
Chen, C., Chuah, J.H., Ali, R., Wang, Y., 2021a. Retinal vessel segmentation using deep disease mortality. Eur. Heart J. 32 (4), 422–429.
learning: a review. IEEE Access 9, 111985–112004. Liew, G., Wang, J.J., Cheung, N., Zhang, Y.P., Hsu, W., Lee, M.L., Mitchell, P.,
Chen, X., Williams, B.M., Vallabhaneni, S.R., Czanner, G., Williams, R., Zheng, Y., 2019. Tikellis, G., Taylor, B., Wong, T.Y., 2008. The retinal vasculature as a fractal:
Learning active contour models for medical image segmentation. In: Proceedings methodology, reliability, and relationship to blood pressure. Ophthalmology 115
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. (11), 1951–1956.
11632–11640. Mirsharif, Q., Tajeripour, F., Pourreza, H., 2013. Automated characterization of blood
Chen, W., Yu, S., Ma, K., Ji, W., Bian, C., Chu, C., Shen, L., Zheng, Y., 2021b. TW-GAN: vessels as arteries and veins in retinal images. Comput. Med. Imaging Graph. 37
Topology and width aware GAN for retinal artery/vein classification. Med. Image (7–8), 607–617.
Anal. 102340. Mookiah, M.R.K., Hogg, S., MacGillivray, T.J., Prathiba, V., Pradeepa, R., Mohan, V.,
Chen, W., Yu, S., Wu, J., Ma, K., Bian, C., Chu, C., Shen, L., Zheng, Y., 2020. TR- Anjana, R.M., Doney, A.S., Palmer, C.N., Trucco, E., 2021. A review of ma-
GAN: Topology ranking GAN with triplet loss for retinal artery/vein classification. chine learning methods for retinal blood vessel segmentation and artery/vein
In: International Conference on Medical Image Computing and Computer-Assisted classification. Med. Image Anal. 68, 101905.
Intervention. Springer, pp. 616–625. Niemeijer, M., van Ginneken, B., Abràmoff, M.D., 2009. Automatic classification of
Cheung, N., Donaghue, K.C., Liew, G., Rogers, S.L., Wang, J.J., Lim, S.-W., Jenkins, A.J., retinal vessels into arteries and veins. In: Medical Imaging 2009: Computer-Aided
Hsu, W., Lee, M.L., Wong, T.Y., 2009. Quantitative assessment of early diabetic Diagnosis, vol. 7260, SPIE, pp. 422–429.
retinopathy using fractal analysis. Diabetes Care 32 (1), 106–110. Niemeijer, M., Xu, X., Dumitrescu, A.V., Gupta, P., Van Ginneken, B., Folk, J.C.,
Cheung, C.Y., Xu, D., Cheng, C.-Y., Sabanayagam, C., Tham, Y.C., Yu, M., Rim, T.H., Abramoff, M.D., 2011. Automated measurement of the arteriolar-to-venular width
Chai, C.Y., Gopinath, B., Mitchell, P., et al., 2021. A deep-learning system for ratio in digital color fundus photographs. IEEE Trans. Med. Imaging 30 (11),
the assessment of cardiovascular disease risk via the measurement of retinal-vessel 1941–1950.
calibre. Nat. Biomed. Eng. 5 (6), 498–508. Orlando, J.I., Breda, J.B., Van Keer, K., Blaschko, M.B., Blanco, P.J., Bulant, C.A.,
Cheung, C.Y.l., Zheng, Y., Hsu, W., Lee, M.L., Lau, Q.P., Mitchell, P., Wang, J.J., 2018. Towards a glaucoma risk index based on simulated hemodynamics from
Klein, R., Wong, T.Y., 2011. Retinal vascular tortuosity, blood pressure, and fundus images. In: International Conference on Medical Image Computing and
cardiovascular risk factors. Ophthalmology 118 (5), 812–818. Computer-Assisted Intervention. Springer, pp. 65–73.

12
Y. Zhou et al. Medical Image Analysis 93 (2024) 103098

Owen, C.G., Rudnicka, A.R., Welikala, R.A., Fraz, M.M., Barman, S.A., Luben, R., Welikala, R., Foster, P., Whincup, P., Rudnicka, A.R., Owen, C.G., Strachan, D.,
Hayat, S.A., Khaw, K.T., Strachan, D.P., Whincup, P.H., et al., 2019. Retinal vascu- Barman, S., et al., 2017. Automated arteriole and venule classification using deep
lometry associations with cardiometabolic risk factors in the european prospective learning for retinal images from the UK Biobank cohort. Comput. Biol. Med. 90,
investigation of cancer—norfolk study. Ophthalmology 126 (1), 96–106. 23–32.
Perez-Rovira, A., MacGillivray, T., Trucco, E., Chin, K., Zutis, K., Lupascu, C., Tegolo, D., Wong, T.Y., Islam, F.A., Klein, R., Klein, B.E., Cotch, M.F., Castro, C., Sharrett, A.R.,
Giachetti, A., Wilson, P.J., Doney, A., et al., 2011. VAMPIRE: vessel assessment Shahar, E., 2006. Retinal vascular caliber, cardiovascular risk factors, and inflam-
and measurement platform for images of the retina. In: 2011 Annual International mation: the multi-ethnic study of atherosclerosis (MESA). Invest. Ophthalmol. Vis.
Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, pp. Sci. 47 (6), 2341–2350.
3391–3394. Wong, T.Y., Mitchell, P., 2004. Hypertensive retinopathy. N. Engl. J. Med. 351 (22),
Relan, D., Ballerini, L., Trucco, E., MacGillivray, T., 2019. Using orthogonal locality 2310–2317.
preserving projections to find dominant features for classifying retinal blood vessels. Xie, J., Liu, Y., Zheng, Y., Su, P., Hu, Y., Yang, J., Liu, J., Zhao, Y., 2020. Classification
Multimedia Tools Appl. 78 (10), 12783–12803. of retinal vessels into artery-vein in OCT angiography guided by fundus images.
Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for In: International Conference on Medical Image Computing and Computer-Assisted
biomedical image segmentation. In: International Conference on Medical Image Intervention. Springer, pp. 117–127.
Computing and Computer-Assisted Intervention. Springer, pp. 234–241. Xu, X., Ding, W., Abràmoff, M.D., Cao, R., 2017. An improved arteriovenous classifica-
Sandoval-Garcia, E., McLachlan, S., Price, A.H., MacGillivray, T.J., Strachan, M.W., tion method for the early diagnostics of various diseases in retinal image. Comput.
Wilson, J.F., Price, J.F., 2021. Retinal arteriolar tortuosity and fractal dimension are Methods Programs Biomed. 141, 3–9.
associated with long-term cardiovascular outcomes in people with type 2 diabetes. Xu, X., Wang, R., Lv, P., Gao, B., Li, C., Tian, Z., Tan, T., Xu, F., 2018. Simultaneous
Diabetologia 64 (10), 2215–2227. arteriole and venule segmentation with domain-specific loss function on a new
Schnier, C., Bush, K., Nolan, J., Sudlow, C., 2019. UK Biobank Outcome Adjudication public database. Biomed. Opt. Express 9 (7), 3153–3166.
Group. Definitions of stroke for UK Biobank phase 1 outcomes adjudication. 2017. You, A., Kim, J.K., Ryu, I.H., Yoo, T.K., 2022. Application of generative adversarial
Seidelmann, S.B., Claggett, B., Bravo, P.E., Gupta, A., Farhad, H., Klein, B.E., Klein, R., networks (GAN) for ophthalmology image domains: A survey. Eye Vis. 9 (1), 1–19.
Di Carli, M., Solomon, S.D., 2016. Retinal vessel calibers in predicting long-term Yu, J., Xiao, K., Huang, J., Sun, X., Jiang, C., 2017. Reduced retinal vessel density
cardiovascular outcomes: the atherosclerosis risk in communities study. Circulation in obstructive sleep apnea syndrome patients: an optical coherence tomography
134 (18), 1328–1338. angiography study. Invest. Ophthalmol. Vis. Sci. 58 (9), 3506–3512.
Shi, C., Chen, Y., Kwapong, W.R., Tong, Q., Wu, S., Zhou, Y., Miao, H., Shen, M., Zamperini, A., Giachetti, A., Trucco, E., Chin, K.S., 2012. Effective features for artery-
Ye, H., 2020. Characterization by fractal dimension analysis of the retinal capillary vein classification in digital fundus images. In: 2012 25th IEEE International
network in Parkinson disease. Retina 40 (8), 1483–1491. Symposium on Computer-Based Medical Systems. CBMS, IEEE, pp. 1–6.
Shi, D., Lin, Z., Wang, W., Tan, Z., Shang, X., Zhang, X., Meng, W., Ge, Z., He, M., Zhao, Y., Rada, L., Chen, K., Harding, S.P., Zheng, Y., 2015. Automated vessel
2022. A deep learning system for fully automated retinal vessel measurement in segmentation using infinite perimeter active contour model with hybrid region
high throughput image analysis. Front. Cardiovasc. Med. 9. information with application to retinal images. IEEE Trans. Med. Imaging 34 (9),
Shit, S., Paetzold, J.C., Sekuboyina, A., Ezhov, I., Unger, A., Zhylka, A., Pluim, J.P., 1797–1807.
Bauer, U., Menze, B.H., 2021. Cldice-a novel topology-preserving loss function for Zhao, Y., Xie, J., Zhang, H., Zheng, Y., Zhao, Y., Qi, H., Zhao, Y., Su, P., Liu, J.,
tubular structure segmentation. In: Proceedings of the IEEE/CVF Conference on Liu, Y., 2019. Retinal vascular network topology reconstruction and artery/vein
Computer Vision and Pattern Recognition. pp. 16560–16569. classification via dominant set clustering. IEEE Trans. Med. Imaging 39 (2),
Srinidhi, C.L., Aparna, P., Rajan, J., 2019. Automated method for retinal artery/vein 341–356.
separation via graph search metaheuristic approach. IEEE Trans. Image Process. 28 Zheng, Z., Oda, M., Mori, K., 2021. Graph cuts loss to boost model accuracy and
(6), 2705–2718. generalizability for medical image segmentation. In: Proceedings of the IEEE/CVF
Staal, J., Abràmoff, M.D., Niemeijer, M., Viergever, M.A., Van Ginneken, B., 2004. International Conference on Computer Vision. pp. 3304–3313.
Ridge-based vessel segmentation in color images of the retina. IEEE Trans. Med. Zhou, Y., Wagner, S.K., Chia, M.A., Zhao, A., Woodward-Court, P., Xu, M., Struyven, R.,
Imaging 23 (4), 501–509. Alexander, D.C., Keane, P.A., 2022. AutoMorph: Automated Retinal Vascular
Wagner, S.K., Fu, D.J., Faes, L., Liu, X., Huemer, J., Khalid, H., Ferraz, D., Korot, E., Morphology Quantification Via a Deep Learning Pipeline. Transl. Vis. Sci. Technol.
Kelly, C., Balaskas, K., et al., 2020. Insights into systemic disease through retinal 11 (7), 12.
imaging-based oculomics. Transl. Vis. Sci. Technol. 9 (2), 6. Zhou, Y., Xu, M., Hu, Y., Lin, H., Jacob, J., Keane, P.A., Alexander, D.C., 2021. Learning
Wagner, S.K., Hughes, F., Cortina-Borja, M., Pontikos, N., Struyven, R., Liu, X., to address intra-segment misclassification in retinal imaging. In: International
Montgomery, H., Alexander, D.C., Topol, E., Petersen, S.E., et al., 2022. AlzEye: Conference on Medical Image Computing and Computer-Assisted Intervention.
longitudinal record-level linkage of ophthalmic imaging and hospital admissions of Springer, pp. 482–492.
353 157 patients in London, UK. BMJ Open 12 (3), e058552.

13

You might also like