Bueno2022 Article AutomatedCervicalSpinalCordSeg
Bueno2022 Article AutomatedCervicalSpinalCordSeg
Bueno2022 Article AutomatedCervicalSpinalCordSeg
https://doi.org/10.1007/s10278-022-00637-4
ORIGINAL PAPER
Abstract
Magnetic resonance (MR) imaging is the most sensitive clinical tool in the diagnosis and monitoring of multiple sclerosis
(MS) alterations. Spinal cord evaluation has gained interest in this clinical scenario in recent years, but, unlike the brain,
there is a more limited choice of algorithms to assist spinal cord segmentation. Our goal was to investigate and develop an
automatic MR cervical cord segmentation method, enabling automated and seamless spinal cord atrophy assessment and
setting the stage for the development of an aggregated algorithm for the extraction of lesion-related imaging biomarkers.
The algorithm was developed using a real-world MR imaging dataset of 121 MS patients (96 cases used as a training dataset
and 25 cases as a validation dataset). Transversal, 3D T1-weighted gradient echo MR images (TE/TR/FA = 1.7–2.7 ms/5.6–
8.2 ms/12°) were acquired in a 3 T system (Signa HD, GEHC) as standard of care in our clinical practice. Experienced
radiologists supervised the manual labelling, which was considered the ground-truth. The 2D convolutional neural network
consisted of a hybrid residual attention-aware segmentation method trained to delineate the cervical spinal cord. The train-
ing was conducted using a focal loss function, based on the Tversky index to address label imbalance, and an automatic
optimal learning rate finder. Our automated model provided an accurate segmentation, achieving a validation DICE coef-
ficient of 0.904 ± 0.101 compared with the manual delineation. An automatic method for cervical spinal cord segmentation
on T1-weighted MR images was successfully implemented. It will have direct implications serving as the first step for
accelerating the process for MS staging and follow-up through imaging biomarkers.
Keywords Segmentation · MRI · Multiple sclerosis · Deep learning · Residual attention-aware · CNN
Introduction
13
Vol.:(0123456789)
Journal of Digital Imaging
basal ganglia, and the brain stem, as well as the grey matter algorithms has been proven in many domains, such as com-
of the spinal cord. Moreover, neurodegeneration gives rise puter vision [17]. Furthermore, CNNs are known as the
to axonal loss in the white matter and diffuse changes in the state-of-the-art in biomedical image segmentation, demon-
entire grey matter, affecting the brain and spinal cord in a strating to be highly robust to the variation of image qual-
global sense. These changes will finally result in profound ity characteristics [18]. The main innovation of CNN with
brain tissue loss and atrophy, most pronounced in the pro- respect to traditional machine learning models is their ability
gressive stages of the disease [4]. Several neuropathological to learn which filters to apply to the input images to extract
and MR imaging studies have proved the involvement of the the most appropriate features to solve a specific problem,
spinal cord in MS [5–7]. such as image classification.
Furthermore, neurodegeneration of the spinal cord is In the image segmentation domain, the U-Net architecture
considered to be the main pathological cause of irreversible [19] has meant an important improvement in performance.
locomotor disability [8, 9]. In particular, MR imaging of the The success is built on a down-sampling path, to encode
spinal cord has shown consequential signs of axonal loss input images and get context, and an up-sampling path, to
by quantifying cord atrophy, which is defined as the reduc- decode feature maps for a segmentation map and recover
tion of the cord cross-sectional area correlated with physical spatial information. Recent efforts have attempted to develop
disability. In addition, MR images also show cervical cord fully automated methods with deep learning architectures to
T2-weighted hyperintense areas related to parenchyma inju- obtain an accurate spinal cord segmentation, such as the Spi-
ries in more than 90% of MS patients [10]. nal Cord Toolbox (SCT), which is a comprehensive open-
An effective method in the assessment of MS patients source software, dedicated to the processing of spinal cord
is the MR evaluation of spinal cord atrophy and lesions. MR images [20]. Another existing spinal cord automatic
The evaluation of the cross-sectional area of the spinal cord segmentation method (BASICSeg) from axial T2-weighted
volume and lesions act as imaging biomarkers, evaluating MR imaging is based on U-Net architecture and uses pre-
its structural integrity [11]. However, MR imaging stand- processed data obtained from SCT [21]. These methods
ardization and subsequent analysis are particularly difficult use pre-processing techniques such as data transformation,
[12] due to the presence of varying MR acquisition proto- region cropping, feature extraction, and selection, which
cols, which leads to different images (signal-to-noise ratio, usually results in time-consuming steps through the need
contrast-to-noise ratio, temporal, and spatial resolutions) for user interaction, hindering their acceptance of use as
and the presence of artefacts, such as susceptibility, motion, a medical device and, therefore, their clinical application.
chemical shift, ghosting, blurring and Gibbs artefact. These Different pre-processing techniques are often applied to
factors result in special MR images features that limit their also deal with the unbalanced spinal cord images problem.
use without pre-processing steps and also would affect to In the proposed methodology, we handle this limitation
the generalization of a deep learning model that can work through the CNN design and parameterization, so that we
with a huge variety of MR images, one of our main goals. automatically obtain the whole cervical spinal cord seg-
Also, there is a wide range of spinal cord shapes, lengths, mentation from real-world data, i.e. no pre-processed MR
and pathologies. All these aspects introduce complexity into images, thus avoiding time-consuming techniques which
spinal cord evaluation and segmentation. entails the need of human interaction and enabling the use
Spinal cord localization, atrophy measurement, and of common data acquired in MR systems used in usual hos-
focal areas of inflammatory demyelination detection can pital practice. Nowadays, it is easy to find real-world data-
be depicted by segmentation techniques [13]. Several auto- sets which have irregular class distributions, such as those
matic segmentation methods have been proposed [13–23]. encountered in the MS spinal cord segmentation task; i.e.
Whereas these methods have shown good performance [13], most of the data instances belong to one class and only a few
they often require initial specifications or landmarks and instances belong to others. That is why training CNNs on
are restricted to specific MR image acquisition protocols as very unbalanced datasets is an important research challenge
mentioned previously [14]. Some of their limitations relate [22, 23]. Data with < 1% of spinal cord voxels remains a
to traditional intensity-based deterministic algorithms, continued issue of concern and needs to be addressed [24].
which are limited by factors relating to changing sensitivity To overcome this issue, a residual attention-aware
between MR images [15]. Lesion intensities can be confused mechanism [25] was used, which has become renowned in
with normal structures due to non-optimal image quality the fields of computer vision [26] and image classification
[15]. Moreover, the lack of validation against multicentre [27]. This technique generates attention-aware features that
data or cases with spinal cord damage has limited their change adaptively as layers going deeper and are based
application in large clinical studies [16]. on spatial features to solve the imbalanced data problem.
In recent years, outperformance of convolutional neural Furthermore, the residual units reduce the vanishing gradi-
networks (CNNs) as compared to traditional deterministic ent problem which take place in networks with too many
13
Journal of Digital Imaging
layers, decreasing their performance and stability as the a scientific computer with an Intel i7 processor, working at
value of gradients approaches zero in early layers. 2.2 GHz and 16 Gb of RAM memory. The model training was
Further to a proof-of-concept study developed with a carried out on a Nvidia GeForce GTX 2080 with 8 Gb memory.
preliminary methodology in a reduced cohort [28], in this Early stopping callback of Keras was used to monitor
extended work, we aim to present a novel approach for auto- the validation loss at each epoch during the training process
mated cervical spinal cord segmentation from MR, based on and impose a stop criterion (if the validation loss had not
hybrid residual attention-aware mechanisms that, together improved in the last ten epochs, training was interrupted).
with a focal loss function [29] with the Tversky-index [30] The proposed method was divided into 4 main steps
as main metric, addresses the problems of unbalanced anno- (Fig. 1). After image acquisition, data was resized and nor-
tation and MR imaging pre-processing. We evaluate the per- malized to avoid heterogeneous image intensities. Then, all
formance of this methodology by its application to a dataset the MR images were manually segmented through the delin-
of real-world MR images (3D-T1 weighted images acquired eation of the cervical spinal cord, leading to the ground-truth
in a 3 T system) acquired in patients suffering from MS. dataset. Finally, part of the acquired data was used to train
the designed CNN and validation metrics were obtained in
the rest of the dataset (validation dataset).
Materials and Methods
A) Datasets
All models and methods were implemented in Python 3.7
and Anaconda framework using Keras (v2.2.4) and Tensor- The ethics committee approved the observational
Flow (v1.14.0) libraries. Model training was carried out by non-interventional study for which the dataset was col-
Fig. 1 Pipeline designed to build the proposed automatic cervical generation (step 3), to finally train and validate a CNN-based archi-
spinal cord segmentation. Going from the dataset generation (step 1), tecture (step 4)
going through the data normalization (step 2) and expert ground-truth
13
Journal of Digital Imaging
lected. The informed consent was waived. Anonymised nearest-neighbour interpolation. Manual labelling was
MR exams of 121 MS patients from the hospital Picture performed slice by slice by two radiologists (with 12
Archiving and Communication System (PACS) were and 30 years of experience, respectively) using the ITK-
collected and used for the development and validation SNAP tool [33]. Discrepancies were solved by consen-
of the proposed algorithm. sus. The resulting cervical spinal cord mask was consid-
The population series consisted of 34 men and 87 ered as being a ground-truth.
women, with a mean age of 45 years old (the range being
23 to 74). In terms of MS clinical course, 70% of the D) Cervical Spinal Cord Segmentation Network Training
patients were diagnosed as relapsing–remitting (RR), 16% and Hyperparameter Tuning
as secondary-progressive (SP), 8% as primary-progressive
(PP), 5% as clinically isolated syndrome (CIS), and 1% as To reach our goal of obtaining the desired cervical spinal
progressive-relapsing (1%). We do not know a priori the cord segmentation, a 2D residual attention-aware U-Net,
number of patients with presence or not of lesions. based on a residual attention mechanism and U-Net connec-
3D T1-weighted gradient echo, axial MR images (TE/ tions, was used as architecture. The residual block enables
TR/FA = 1.7–2.7 ms/5.6–8.2 ms/12°) were acquired on a deeper network by having hundreds of layers, whereas the
various 3 T scanners (Signa HDxt and Optima MR360, attention mechanism learns to focus on relevant locations.
GE Healthcare). All MR images had 1-mm slice Regarding training design, it was made with an automatic
thickness and an in-plane isotropic resolution from optimal learning rate finder, a focal loss function based on
0.43 × 0.43 to 1.02 × 1.02 mm. the Tversky-index (FTL) [29] to address the problem of label
This dataset was randomly divided into two different imbalance and the Matthews correlation coefficient (MCC)
groups, one for training (80%, 96 cases, 15,668 images) as metrics during training [34].
and the other one for validation (20%, 25 cases, 4,260 Finally, to evaluate the performance of both the repro-
images), following the traditional distribution based on ducibility of manual segmentations and the spatial overlap
the Pareto principle [31]. accuracy of automatic segmentation, the DICE coefficient
We selected MR images acquired in routine clini- [35] was used as statistical validation metric.
cal practice for reproducibility and feasibility consid-
erations. Cervical spinal cord region is selected for our
analysis since accumulating evidence supports routine Residual Attention‑Aware U‑net architecture
measurement of cervical spine (C-spine) area loss in MS
[32] We are aware that better cervical spinal cord seg- To solve the high-class imbalance problem (proportion of
mentations can be obtained through very high-resolution spinal cord compared to the rest of the image), and due
images, movement synchronism and higher acquisition to its ability to improve the model performance, a “very
times not used in most clinical centres. deep” architecture was selected for the cervical spinal cord
segmentation.
B) Data Normalization In the selected network, residual blocks were used in the
different U-net layers, except the first and the last layers, to
All MR 2D images were resized to a common shape solve the gradient vanishing problem by using identity map-
of 256 × 256 pixels using a bicubic algorithm interpola- pings as the skip connections (Fig. 2). Because the residual
tion over 4 × 4 pixels. To avoid inconsistent tissue inten- units directly propagate features from early to late convolu-
sities in the acquisition process, image normalization at tion, an improvement was noted in the model performance.
slice level was used to set the pixel values between 0 and The stacked residual blocks were introduced to mitigate
1, as in (1): the effect of network degradation on model learning by using
identity mappings as the skip connections.
z(i, j) = (x(i, j) − xmin )∕(xmax − xmin ) (1) The residual unit was made up of a block that contains a
where z(i, j) was the normalized pixel value, x(i, j) the batch normalization layer (BN), an activation (ReLU) layer,
original pixel intensity, and xmin , xmax the minimum and and a convolutional layer, repeated three times. The detailed
maximum image intensity values, respectively. residual unit is shown in Fig. 3.
The batch normalization was employed for each convolu-
C) Cervical Spinal Cord Ground-Truth Generation tional layer and yields the mean and standard deviation to nor-
malize the activation of current layer before advancing to the
To harmonize the image size, ground-truth images next layer. A convolutional identity mapping connection was
were also resized to a shape of 256 × 256 pixels using a used to ensure the accuracy as the network goes “deeper” [36].
13
Journal of Digital Imaging
Fig. 2 Overall network architecture based on residual blocks and attention gates. Residual blocks allow a deeper network by solving the vanish-
ing gradient problem. Attention mechanisms solve the unbalanced data problem by learning to focus on locations that are relevant
Residual attention learning divided the attention module unbalanced data and small regions of interest (ROIs) such as
into a trunk branch, used to process the original features, the spinal cord, FN detections need to be weighted higher than
and a soft branch, used to construct the identity mapping, FPs to improve recall rate. This was addressed by the focal
enhancing the good features, and reducing the noise from Tversky loss index, which controls the FP and FN and controls
the trunk branch. the loss contribution in training examples, with difficult ROI
Applied to our domain, by using this attention residual detection.
mechanism, the original feature information was preserved 1
through the trunk branch and focused on cervical spinal cord
∑
FTLc = (1 − TIc ) 𝛾
relevant locations by the soft mask branch. c
13
Journal of Digital Imaging
Fig. 4 Triangular learning rate policy. Orange line: learning rate value
changing between bounds. Step size is the number of iterations in half
a cycle
baselr and maxlr (Fig. 4). Keras learning rate finder technique
was used to automatically find a suitable maxlr and baselr for
cyclical learning rate scheduling and apply it to our experi-
ments (Fig. 5).
Fig. 6 Loss over learning rate to identify optimal CLR bounds, maxlr
We started by training during 2 to 5 epochs to see the :loss starts to increase, baselr : loss starts to decrease
overall behaviour of our model loss function over time. The
lower boundary was set to a small value (1e−10 ) to let the
network learn while the upper boundary was defined with Results
a too large value (1e1 ) for our model to learn. After training
completion, a smoothed loss over time was plotted (Fig. 5), An initial cervical spinal cord segmentation was per-
enabling us to define when the learning rate was just large formed, applying residual attention-aware U-net archi-
enough for loss to decrease (optimal lower bound) and then tecture with focal Tversky as loss function and setting
too large, to the point where loss starts to increase (optimal the learning rate with the CLR method while training.
upper bound). Finally, the network was fully trained along Our model provided an automated and accurate method,
all the desired epochs. achieving a MCC of 0.95 as training metric and a DICE
In Fig. 6, we can see the plot of the overall behaviour of coefficient of 0.9 as statistical validation metric. The
the loss function during 2 epochs, which led us to determine model was trained using a batch size of 15 along 30
the optimal CLR bounds for the whole training. In our case, epochs, defined with an Early Stopping callback (the train-
maxlr = 1e−3 (loss started to increase) and baselr = 1e−5 ing stops after 10 epochs of not training loss improve-
(loss started to decrease). ment). Figure 8 shows how the MCC and loss obtained
Once the final CLR bounds were determined, the net- evolved over the different training epochs on the validation
work was trained from scratch varying the learning rate in a set of our final model.
linear manner between the minimum (baselr ) and the maxi- The trained method was validated on the 20% of the
mum ( maxlr ) (Fig. 7). In this way, we were able to reduce dataset described above (validation dataset), where the
the work in setting learning rates and improve the network target region class is significantly smaller than the back-
performance. ground class.
13
Journal of Digital Imaging
Cyclic Learning Rate (CLR) a slight improvement over those trained with U-net, since
0,0012 deep networks have a higher learning capability. Finally, the
0,001
results show that residual attention-aware U-net architec-
ture with a CLR finder has the best performance, i.e., high
0,0008
DICE coefficient, low standard deviation (STD), and high,
Learning rate
0,8
0,6
Loss/MCC
0,4
val_mahews_correlaon val_loss
0,2
0
0 5 10 15 20 25 30
Epoch
13
Journal of Digital Imaging
Table 1 Results and comparison between model and loss function schemes on validation dataset
Model Loss function Parameters Learning rate Mean DICE ± STD MCC P25 Median (P50) P75
U-net *LH TL α = 0.4, β = 0.6 1e−5 0.66 ± 0.24 0.68 0.57 0.75 0.84
U-net TL α = 0.3, β = 0.7 1e −5 0.68 ± 0.24 0.71 0.61 0.77 0.85
U-net FTL γ = 1.33, α = 0.7, β = 0.3 1e−5 0.62 ± 0.25 0.64 0.53 0.71 0.81
U-net FTL γ = 1.33, α = 0.6, β = 0.4 1e−5 0.68 ± 0.27 0.70 0.62 0.80 0.87
U-net FTL γ = 1.6, α = 0.6, β = 0.4 1e−5 0.74 ± 0.23 0.77 0.70 0.82 0.88
RA-U-net FTL γ = 1.33, α = 0.6, β = 0.4 1e −5 0.86 ± 0.11 0.89 0.85 0.89 0.92
RA-U-net FTL γ = 1.6, α = 0.6, β = 0.4 1e−4 0.80 ± 0.22 0.84 0.82 0.87 0.90
Deep U-net FTL γ = 1.33, α = 0.6, β = 0.4 CLR 0.79 ± 0.25 0.83 0.82 0.89 0.92
(maxlr = 1e−2 , baselr = 1e−4
RA-U-net **RH FTL γ = 1.33, α = 0.6, β = 0.4 CLR 0.904 ± 0.101 0.95 0.90 0.92 0.94
(maxlr = 1e−3 , baselr = 1e−5
Column “Loss function”: Tversky loss (TL), focal Tversky loss (FTL). Column “Parameters” refers to loss function data. Column “Models”:
U-net, RA-U-net, deep U-net [38]. Column “MCC”: Matthews correlation coefficient, final training metric. Column “P25”: 25th percentile. Col-
umn “Median (P50)”: median DICE and 50th percentile. Column “P75”: 75th percentile
*LH left histogram, **RH right histogram
overlap of both is shown in Fig. 11d. The red voxels identi- Fig. 12 where a segmentation error example is shown. In
fied the mismatched area (pixels that are only present in one 3D and sagittal views, we can see two inhomogeneities
of the masks) and the green voxels indicate the matching that are surrounded in yellow. These segmentation failures,
area (pixels that belong to both the manual and predicted which appear as two spikes toward the end of the cervi-
mask). cal spinal cord, are caused by the settings in slices 35,
According to Fig. 11, only one random example from 36, and 37. In these images, we can see that the cervical
more than 2000 axial images that make up the validation spinal cord portion is very close to another anatomical
dataset is shown. It is observed that our model achieves structure of similar intensity that seem to be practically
a successful automatic prediction which hardly differs connected. That is why the prediction mask is bigger than
from that performed by the expert radiologist. Despite those obtained for slices 34, 38, and 39 (where the cervical
good overall performance, there are some situations that spinal cord region is more separated from that anatomical
show that our automatic segmentation tool can make some structure) and, therefore, the segmentation mask is fitted to
prediction mistakes. One example of that can be seen on the cervical spinal cord region and there is no confusion.
Fig. 9 DICE distribution histogram. a First model distribution, scattered distribution. b Final model distribution, narrow distribution
13
Journal of Digital Imaging
13
Journal of Digital Imaging
Fig. 12 Poor segmentation performance setting. Slices 35, 36, and 37 and 39 show a well-delimited cervical spinal cord region which leads
show the cervical spinal cord very close to another anatomical struc- us to a correct segmentation mask, and these correspond to the adja-
ture that could cause oversized prediction masks and is the last inho- cent peak areas
mogeneity that is surrounded in 3D and sagittal views. Slices 34, 38,
2D-dilated convolutions detects the spinal cord centreline (when carrying out back-propagation) and the degradation
being necessary human interaction to initialize the spinal issue, which results in poor training results. The residual
cord location (e.g., three points in the spinal cord) for an blocks were stacked in the U-net layers, except the first and
optimal performance and also carrying out the next CNN last layers, to solve the vanishing gradient problem by allow-
for spinal cord segmentation in a framework that is more ing the network to have hundreds of layers. The attention
sensitive to the quality of that detection module. mechanism learns to focus on locations that are relevant by
There is another method that depends on the previous allowing the retention of the original feature information
one, for automatic segmentation of the spinal cord after trau- through the trunk branch and paying attention to those spi-
matic contusion injury from axial T2-weighted MR imag- nal cord regions by the soft mask branch. The training was
ing and whose DICE coefficient is 0.93. Firstly, the optic designed with a focal loss function, based on the Tversky
algorithm [39] is used from the SCT to first detect the spinal index, to address the problem of label imbalance in medical
cord centreline and create a square mask around it. Secondly, image segmentation and an automatic optimal learning rate
three different novel 2D-CNN architectures are applied to finder. Finally, a DICE coefficient of 0.904 was achieved.
achieve the segmentation. Moving away from our method As we mentioned during the paper, the usefulness of this
generalization purpose, this study cohort is derived from a model, and as future work in development, this segmentation
single institution with all the imaging being performed on a algorithm will allow the quantification of the cervical spinal
single MR imaging scanner using similar parameters, poten- cord volume and, therefore, atrophy assessment through the
tially biasing the results. Furthermore, the pre-processing comparison of the obtained volume against that associated
step of cropping the image around the spinal cord centreline to a healthy population. In addition, this model will help to
depends on a third-party tool. develop an aggregated lesion detection model from STIR
In our study, overcoming what we understand as other image sequences which will be focused on the lesion locali-
methods limitations and with the aim of avoiding any type of zation on the cervical spine region, since it would be used as
pre-processing techniques and reducing human interaction, first step by obtaining the cervical spinal cord location in the
a 2D residual attention-aware U-Net, based on a residual corresponding STIR space after applying image registration
attention mechanism and U-Net connections, was imple- algorithms between both sequences.
mented to obtain the desired spinal cord segmentation. We Limitations detected during our study will be investi-
selected a deep architecture since it is proved that deeper gated in future works. Some of them primarily relate to the
networks have a higher learning capacity, although they relatively small sample size of patients. Consequently, to
show some drawbacks like the gradient vanishing problem proceed with a K-fold cross-validation scheme would have
13
Journal of Digital Imaging
been reasonable as it ensures that every observation from our Funding This work was funded by a Generalitat Valenciana PhD fel-
limited dataset has the chance of appearing in training and lowship (grant number ACIF/2017/057) and the Universitat Politècnica
de València and Polytechnic La Fe Hospital research project Deep-
validation set. A performance comparison between exist- Medul (grant number 2018/0274) (Deep Learning for spinal cord seg-
ing tools would better clarify the quality of our proposed mentation in Multiple Sclerosis).
model compared to others although this can sometimes
be complicated since not all of them work with the same Data Availability Not applicable.
MR image types. The need for external validation is one of
them; our method should be externally validated in other Code Availability Not applicable.
centres using different MR scanners from different vendors,
magnetic fields, and acquisition parameters. In this way, we Declarations
will be able to evaluate the interoperability of our method.
Ethics Approval This research study was conducted retrospectively
Increasing patient numbers will improve our database and from data obtained for observational purposes. An official waiver of
lead us to a greater number of training samples, therefore, ethical approval was granted from GIBI230 − IISLAFE of the Hos-
achieving fuller training and better performance. pital Universitari I Politècnic of Valencia.
13
Journal of Digital Imaging
in multiple sclerosis. Acta Neurol Scand 136:330–337. https:// 27. Huang P, Wang J, Zhang J, et al (2020) Attention-aware Residual
doi.org/10.1111/ane.12729 Network based Manifold Learning for White Blood Cells Clas-
10. Valsasina P, Rocca MA, Horsfield MA, et al (2013) Regional cervical sification. IEEE J Biomed Heal Informatics 1–1. https://doi.org/
cord atrophy and disability in multiple sclerosis: A voxel-based analy- 10.1109/jbhi.2020.3012711
sis. Radiology 266:853–861. https://doi.org/10.1148/radiol.12120813 28. Bueno Gómez A, Alberich-Bayarri A, Bosch I, Carreres Polo J
11. Filippi M, Agosta F (2010) Imaging biomarkers in multiple scle- (2021) Automatic MR Spinal Cord Segmentation by Hybrid Residual
rosis. J. Magn. Reson. Imaging 31:770–788 Attention-Aware Convolutional Neural Networks and Learning Rate
12. Yiannakas MC, Mustafa AM, De Leener B, et al (2016) Fully Optimization on Real World Data. In: IFMBE Proceedings. Springer
automated segmentation of the cervical cord from T1-weighted Science and Business Media Deutschland GmbH, pp 158–168
MRI using PropSeg: Application to multiple sclerosis. NeuroIm- 29. Abraham N, Khan NM (2018) A Novel Focal Tversky loss func-
age Clin 10:71–77. https://doi.org/10.1016/j.nicl.2015.11.001 tion with improved Attention U-Net for lesion segmentation. Proc
13. De Leener B, Taso M, Cohen-Adad J, Callot V (2016) Segmenta- - Int Symp Biomed Imaging 2019-April:683–687
tion of the human spinal cord. Magn. Reson. Mater. Physics, Biol. 30. Salehi SSM, Erdogmus D, Gholipour A (2017) Tversky loss func-
Med. 29:125–153 tion for image segmentation using 3D fully convolutional deep
14. Horsfield MA, Sala S, Neema M, et al (2010) Rapid semi-automatic networks. In: Lecture Notes in Computer Science (including sub-
segmentation of the spinal cord from magnetic resonance images: series Lecture Notes in Artificial Intelligence and Lecture Notes
Application in multiple sclerosis. Neuroimage 50:446–455. https:// in Bioinformatics). Springer Verlag, pp 379–387
doi.org/10.1016/j.neuroimage.2009.12.121 31. The Pareto Principle | Dunford | The Plymouth Student Scien-
15. Chen M, Carass A, Oh J, et al (2013) Automatic magnetic reso- tist. https://bcur.org/journals/index.php/TPSS/article/view/408.
nance spinal cord segmentation with topology constraints for vari- Accessed 9 Dec 2019
able fields of view. Neuroimage 83:1051–1062. https://d oi.o rg/1 0. 32. Mina Y, Azodi S, Dubuche T, et al (2021) Cervical and thoracic
1016/j.neuroimage.2013.07.060 cord atrophy in multiple sclerosis phenotypes: Quantification and
16. Jovicich J, Czanner S, Greve D, et al (2006) Reliability in multi- correlation with clinical disability. NeuroImage Clin 30:102680.
site structural MRI studies: Effects of gradient non-linearity cor- https://doi.org/10.1016/J.NICL.2021.102680
rection on phantom and human data. Neuroimage 30:436–443. 33. Yushkevich PA, Gerig G (2017) ITK-SNAP: An Intractive Medi-
https://doi.org/10.1016/j.neuroimage.2005.09.046 cal Image Segmentation Tool to Meet the Need for Expert-Guided
17. Szlávik Z, Szirányi T (2004) Face Analysis Using CNN-UM Segmentation of Complex Medical Images. IEEE Pulse 8:54–57.
18. Litjens G, Kooi T, Bejnordi BE, et al (2017) A survey on deep https://doi.org/10.1109/MPUL.2017.2701493
learning in medical image analysis. Med. Image Anal. 42:60–88 34. Chicco D, Jurman G The advantages of the Matthews correlation
19. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks coefficient (MCC) over F1 score and accuracy in binary classifica-
for biomedical image segmentation. In: Lecture Notes in Computer Sci- tion evaluation. https://doi.org/10.1186/s12864-019-6413-7
ence (including subseries Lecture Notes in Artificial Intelligence and 35. Eelbode T, Bertels J, Berman M, et al (2020) Optimization for
Lecture Notes in Bioinformatics). Springer Verlag, pp 234–241 Medical Image Segmentation: Theory and Practice When Evaluat-
20. De Leener B, Lévy S, Dupont SM, et al (2017) SCT: Spinal Cord ing With Dice Score or Jaccard Index. IEEE Trans Med Imaging
Toolbox, an open-source software for processing spinal cord MRI 39:3679–3690. https://doi.org/10.1109/TMI.2020.3002417
data. Neuroimage 145:24–43. https://doi.org/10.1016/j.neuroimage. 36. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for
2016.10.009 image recognition. In: Proceedings of the IEEE Computer Society
21. McCoy DB, Dupont SM, Gros C, et al (2019) Convolutional neu- Conference on Computer Vision and Pattern Recognition. IEEE
ral network–based automated segmentation of the spinal cord and Computer Society, pp 770–778
contusion injury: Deep learning biomarker correlates of motor 37. Smith LN (2017) Cyclical learning rates for training neural net-
impairment in acute spinal cord injury. Am J Neuroradiol 40:737– works. In: Proceedings - 2017 IEEE Winter Conference on Appli-
744. https://doi.org/10.3174/ajnr.A6020 cations of Computer Vision, WACV 2017. Institute of Electrical
22. Lin T-Y, Goyal P, Girshick R, et al (2017) Focal Loss for Dense and Electronics Engineers Inc., pp 464–472
Object Detection 38. Li R, Liu W, Yang L, et al (2017) DeepUNet: A Deep Fully Con-
23. Montahaei E, Ghorbani M, Baghshah MS, Rabiee HR (2018) volutional Network for Pixel-level Sea-Land Segmentation. IEEE
Adversarial Classifier for Imbalanced Problems J Sel Top Appl Earth Obs Remote Sens 11:3954–3962
24. Buda M, Maki A, Mazurowski MA (2017) A systematic study of 39. Gros C, De Leener B, Dupont SM, et al (2018) Automatic spinal
the class imbalance problem in convolutional neural networks. cord localization, robust to MRI contrasts using global curve opti-
https://doi.org/10.1016/j.neunet.2018.07.011 mization. Med Image Anal 44:215–227. https://d oi.o rg/1 0.1 016/j.
25. Wang F, Jiang M, Qian C, et al (2017) Residual attention network media.2017.12.001
for image classification. In: Proceedings - 30th IEEE Conference
on Computer Vision and Pattern Recognition, CVPR 2017. Insti- Publisher's Note Springer Nature remains neutral with regard to
tute of Electrical and Electronics Engineers Inc., pp 6450–6458 jurisdictional claims in published maps and institutional affiliations.
26. Gao L, Li Y, Ning J (2019) Residual attention convolutional net-
work for online visual tracking. IEEE Access 7:94097–94105.
https://doi.org/10.1109/ACCESS.2019.2927791
13